scikit-learn: Use a safe and short repr in error messages and warning

We print the value of an offending parameter in many of our error messages and warnings. However, when the object failing printing, or when its representation is too long, the resulting message is not useful.

We should:

  • Use a safe_repr function to never fail printing (I am pasting an example below, with a test)
  • Only print the 300 first characters (something like this) of it’s return, using a "short_repr’

These two functions should be added in the utils submodule and used in error messages and warnings in the codebase.

def safe_repr(value):
    """Hopefully pretty robust repr equivalent."""
    # this is pretty horrible but should always return *something*
    try:
        return pydoc.text.repr(value)
    except KeyboardInterrupt:
        raise
    except:
        try:
            return repr(value)
        except KeyboardInterrupt:
            raise
        except:
            try:
                # all still in an except block so we catch
                # getattr raising
                name = getattr(value, '__name__', None)
                if name:
                    # ick, recursion
                    return safe_repr(name)
                klass = getattr(value, '__class__', None)
                if klass:
                    return '%s instance' % safe_repr(klass)
            except KeyboardInterrupt:
                raise
            except:
                return 'UNRECOVERABLE REPR FAILURE'


def short_repr(obj):
     msg = safe_repr(obj)
     if len(msg) > 300:
        return msg = '%s...' % msg
     return msg

# For testing (in a test file, not in the same file)
class Vicious(object):
    def __repr__(self):
        raise ValueError


def test_safe_repr():
    safe_repr(Vicious())


safe_repr is borrowed from joblib, but as it is not exposed in the public API, we shouldn’t import it from our vendored version of joblib (elsewhere, the “unvendoring” performed by debian will break the import)

About this issue

  • Original URL
  • State: open
  • Created 7 years ago
  • Comments: 16 (10 by maintainers)

Most upvoted comments

pprint is not robust to failing repr:

In [1]: class FailingRepr():
   ...:     def __repr__(self):
   ...:         raise Exception("I like failing")
   ...: 

In [2]: failing_repr = FailingRepr()

In [3]: import pprint

In [4]: pprint.pprint([failing_repr, failing_repr])
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-4-292281f7e9c6> in <module>
----> 1 pprint.pprint([failing_repr, failing_repr])

/usr/lib/python3.9/pprint.py in pprint(object, stream, indent, width, depth, compact, sort_dicts)
     51         stream=stream, indent=indent, width=width, depth=depth,
     52         compact=compact, sort_dicts=sort_dicts)
---> 53     printer.pprint(object)
     54 
     55 def pformat(object, indent=1, width=80, depth=None, *,

/usr/lib/python3.9/pprint.py in pprint(self, object)
    146 
    147     def pprint(self, object):
--> 148         self._format(object, self._stream, 0, 0, {}, 0)
    149         self._stream.write("\n")
    150 

/usr/lib/python3.9/pprint.py in _format(self, object, stream, indent, allowance, context, level)
    168             self._readable = False
    169             return
--> 170         rep = self._repr(object, context, level)
    171         max_width = self._width - indent - allowance
    172         if len(rep) > max_width:

/usr/lib/python3.9/pprint.py in _repr(self, object, context, level)
    429 
    430     def _repr(self, object, context, level):
--> 431         repr, readable, recursive = self.format(object, context.copy(),
    432                                                 self._depth, level)
    433         if not readable:

/usr/lib/python3.9/pprint.py in format(self, object, context, maxlevels, level)
    442         and whether the object represents a recursive construct.
    443         """
--> 444         return _safe_repr(object, context, maxlevels, level, self._sort_dicts)
    445 
    446     def _pprint_default_dict(self, object, stream, indent, allowance, context, level):

/usr/lib/python3.9/pprint.py in _safe_repr(object, context, maxlevels, level, sort_dicts)
    585         level += 1
    586         for o in object:
--> 587             orepr, oreadable, orecur = _safe_repr(o, context, maxlevels, level, sort_dicts)
    588             append(orepr)
    589             if not oreadable:

/usr/lib/python3.9/pprint.py in _safe_repr(object, context, maxlevels, level, sort_dicts)
    594         return format % ", ".join(components), readable, recursive
    595 
--> 596     rep = repr(object)
    597     return rep, (rep and not rep.startswith('<')), False
    598 

<ipython-input-1-3dc10e8ff2f1> in __repr__(self)
      1 class FailingRepr():
      2     def __repr__(self):
----> 3         raise Exception("I like failing")
      4 

Exception: I like failing

Hence, I do believe that the problem is not addressed.

While the above example can seem contrived, things like this can happen in repr of estimators in some rare cases.

There is probably some work to salvage from pull request #11601

The length of the error message may be improved now with default print_changed_only=True. We should confirm that our pprint is robust to errors before closing this.