fairlearn: Bug: class labels not preserved

>>> eg = ExponentiatedGradient(constraints=DemographicParity(), estimator=LogisticRegression())
>>> eg.fit(pd.DataFrame([[1], [2], [3]]), [2, 0, 0], sensitive_features=[3,2,4])
>>> eg.predict([[1], [2], [3]])
0    0
1    0
2    1
dtype: int32

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 17 (16 by maintainers)

Commits related to this issue

Most upvoted comments

I’ve been wanting to mention that there’s an issue with handling the outputs in the current codebase. As examples, here’s how we handle the output in the GradientBoostingClsasifier:

https://github.com/scikit-learn/scikit-learn/blob/dcfb3df9a3df5aa2a608248316d537cd6b3643ee/sklearn/ensemble/_gb.py#L1095-L1105

(for some reason github’s not rendering them right, so here’s the code)

    def _validate_y(self, y, sample_weight):
        check_classification_targets(y)
        self.classes_, y = np.unique(y, return_inverse=True)
        n_trim_classes = np.count_nonzero(np.bincount(y, sample_weight))
        if n_trim_classes < 2:
            raise ValueError("y contains %d class after sample_weight "
                             "trimmed classes with zero weights, while a "
                             "minimum of 2 classes are required."
                             % n_trim_classes)
        self.n_classes_ = len(self.classes_)
        return y

and in HistGradientBoostingClassifier:

https://github.com/scikit-learn/scikit-learn/blob/dcfb3df9a3df5aa2a608248316d537cd6b3643ee/sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py#L1129-L1142

(for some reason github’s not rendering them right, so here’s the code)

    def _encode_y(self, y):
        # encode classes into 0 ... n_classes - 1 and sets attributes classes_
        # and n_trees_per_iteration_
        check_classification_targets(y)

        label_encoder = LabelEncoder()
        encoded_y = label_encoder.fit_transform(y)
        self.classes_ = label_encoder.classes_
        n_classes = self.classes_.shape[0]
        # only 1 tree for binary classification. For multiclass classification,
        # we build 1 tree per class.
        self.n_trees_per_iteration_ = 1 if n_classes <= 2 else n_classes
        encoded_y = encoded_y.astype(Y_DTYPE, copy=False)
        return encoded_y

I prefer the second solution (which is a much more recent code).