fairlearn: Bug: class labels not preserved

>>> eg = ExponentiatedGradient(constraints=DemographicParity(), estimator=LogisticRegression())
>>> eg.fit(pd.DataFrame([[1], [2], [3]]), [2, 0, 0], sensitive_features=[3,2,4])
>>> eg.predict([[1], [2], [3]])
0    0
1    0
2    1
dtype: int32

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 17 (16 by maintainers)

Commits related to this issue

Check for 0-1 labels in ExponentiatedGradient classification tasks (#410) This is a quick fix for #339 pending a more uniform fix which will enable users to specify a positive label. It makes the in... — committed to fairlearn/fairlearn by riedgar-ms 4 years ago

Most upvoted comments

I’ve been wanting to mention that there’s an issue with handling the outputs in the current codebase. As examples, here’s how we handle the output in the GradientBoostingClsasifier:

https://github.com/scikit-learn/scikit-learn/blob/dcfb3df9a3df5aa2a608248316d537cd6b3643ee/sklearn/ensemble/_gb.py#L1095-L1105

(for some reason github’s not rendering them right, so here’s the code)

    def _validate_y(self, y, sample_weight):
        check_classification_targets(y)
        self.classes_, y = np.unique(y, return_inverse=True)
        n_trim_classes = np.count_nonzero(np.bincount(y, sample_weight))
        if n_trim_classes < 2:
            raise ValueError("y contains %d class after sample_weight "
                             "trimmed classes with zero weights, while a "
                             "minimum of 2 classes are required."
                             % n_trim_classes)
        self.n_classes_ = len(self.classes_)
        return y

and in HistGradientBoostingClassifier:

https://github.com/scikit-learn/scikit-learn/blob/dcfb3df9a3df5aa2a608248316d537cd6b3643ee/sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py#L1129-L1142

(for some reason github’s not rendering them right, so here’s the code)

    def _encode_y(self, y):
        # encode classes into 0 ... n_classes - 1 and sets attributes classes_
        # and n_trees_per_iteration_
        check_classification_targets(y)

        label_encoder = LabelEncoder()
        encoded_y = label_encoder.fit_transform(y)
        self.classes_ = label_encoder.classes_
        n_classes = self.classes_.shape[0]
        # only 1 tree for binary classification. For multiclass classification,
        # we build 1 tree per class.
        self.n_trees_per_iteration_ = 1 if n_classes <= 2 else n_classes
        encoded_y = encoded_y.astype(Y_DTYPE, copy=False)
        return encoded_y

I prefer the second solution (which is a much more recent code).

adrinjalali on Mar 23, 2020