napari: Unified error console and error reporting

🚀 Feature

As discussed in meetings, napari should have an embedded, unified error console. The idea being that, for the most “bundled” like usage scenarios, a user should not have to look in the console that started the napari process to see errors that occured. It should also allow easy review and reporting of errors that have occured in the session, along the lines of #1024, but for all errors raised in napari. This would probably supersede #794

UPDATED 5/28 with Links to relevant topics:

  • #794 describes a mechanism in which we would override sys.excepthook with our own handler. That handler would have some logic that decides when to pop up an error to the user (for instance, if the exception was a subclass of some NapariError class … or perhaps we would want to be even more permissive about what got shown). This is probably the main thing we need to figure out and get right here.
  • #1024 has some code that shows how to store and show tracebacks from errors that were previously raised. Note: a key idea there was that napari-plugin-engine has a base exception class (PluginError) that stores all instances that are created in the class _record attribute. We could do something similar here that would let the user review past exceptions, and potentially send them to us.
  • see also #1303, that has the good idea to have this sort of “error report window” default to showing the most recent error…
  • #1301 is another request to make certain errors (in that case, when opening an image fails) more visible to the user. That’s a key idea here, but rather than having a lot of QDialog instances scattered through the code, we should define a general API that we use throughout the codebase to show an error message in the GUI. (for instance, as shown in #794)
  • some of this stuff might be relevant for telemetry and bug reporting #621

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 2
  • Comments: 26 (25 by maintainers)

Most upvoted comments

Even if automated error reporting gets implemented one day, you still need to think of how/if you support those users who opt-out of automated bug reporting. I suspect you’d still want to give them an easy way to send an individual bug report?

Not sure if this is a good idea, but for perfmon we swap in our own QApplication which has its own version of notify(): https://github.com/napari/napari/blob/master/napari/_qt/qt_event_timing.py#L60-L66

We do that so we can time every single Qt Event as it gets processed. If we put a try/except around that notify call, that we “protect” basically any code that results from any Qt event. I think this would include vispy events. Basically if the except happened it would mean that processing the event failed.

Maybe there is some downside from having a try/except there. But if we want to experiment with a very high level try/except this might be a place to do it. You could experiment just be setting NAPARI_PERFMON=1 and modifying our QApplicationWithTiming. If we wanted to do this long term we could replace QApplicationWithTiming with just QApplicationNapari or something and always insert it.

just updated the origin post in this thread with links to various important discussions going on about how we catch, store, and show exceptions to users in the GUI. which is going to be essential for a bundled app

Just weighing in to say this discussion has been great so far, I’m learning a lot. I think both - how we handle unexpected errors and how to promote more try/ catch within the codebase and good handling of those errors we intentionally raise seem valuable!!

I apologize, but I’m afraid I’m still not following your vision here. by “provide an example bit of code”, I meant a little more than what you show here. For instance, where are you calling your on_error_ignore function here? one place? in every except clause? in place of a try/except block? what does your proposed on_error_ignore function do? if you think it would be helpful to draft up a PR with some of your ideas, I’d be happy to look at it.

so far though, these would be some of my questions:

  • do you see what you’re proposing as an alternative to a try/except/raise and sys.excepthook-based error handler? or more like “yes, and eventually someday we might also add something like on_error_ignore()”
  • if the former, then what do you see as the advantage of adding a new syntax and pattern that developers should learn and use, rather than just using raise Exception() in the standard way? in other words, what is the standard try/catch/raise pattern lacking for you?
  • for the main task at hand (specifically: making sure users always know that/why an error was raised, and that we have a way to capture/report tracebacks), what limitations did you specifically see in the sys.excepthook/handler pattern, and how does what you’re proposing solve them?
  • when you say “allow user to use our error handler”, can you elaborate?
  • was it completely clear how the code in #794 works? Is it clear that a developer could already “use” our handler simply by doing a standard try/except like this:
    try:
        xyz()
    except:
        raise Exception("something wrong")
    
    and that all exceptions like that go to our handler that overrides sys.excepthook, where we can then alert the user instead of having a message printed to stdout in the terminal (which a GUI user may or may not be looking at) unless of course, we chose to print the traceback to the terminal in, for instance, an interactive session?. and is it clear that we could additionally declare specific exception types (e.g. raise NapariError()) that would have specific implications for how they are handled? … and that any developer could raise our error classes if they wanted to?

it’s not a hard switch… so I don’t think it needs to affect this much at all. The main thing that might change eventually is that we remove the button that says “send the stack trace”, and perhaps remove some of the code that temporarily stores a stack trace for review. But i don’t think it will be a wasted effort regardless of how/whether we end up using sentry.

(This is all assuming for the moment that we’re not doing any telemetry #621 … which of course might change some of our strategy.)

I think we should be preparing for telemetry and designing systems now that could be used in such a way in an opt in system. I do like the post to GitHub too though, it’s sort of less magical and mysterious then telemetry and maybe nice to have too. cc @neuromusic

Yes, that worked! Thanks, @tlambert03 !