build: Improve Error DX

There have been some comments around error boundaries & general feedback to devs when errors occur.

Here is an example of a plugin with broken code.

module.exports = {
    name: 'netlify-plugin-one',
    onInit: () => {
      console.log(thing.what) // undefined ref
    },
}

module.exports = {
    name: 'netlify-plugin-one',
    onInit: () => {
      throw new Error('http://www.nooooooooooooooo.com/')
    },
}

Currently errors materialize like so:

Are there ways we can improve upon this?

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 20 (18 by maintainers)

Commits related to this issue

chore(main): release 9.0.2 (#711) Co-authored-by: token-generator-app[bot] <82042599+token-generator-app[bot]@users.noreply.github.com> Co-authored-by: ehmicky <ehmicky@users.noreply.github.com> — committed to netlify/build by token-generator-app[bot] 2 years ago

Most upvoted comments

Chiming in to +1 these suggestions! I think it’s imperative we more clearly surface these errors both in the logs and the UI to avoid user confusion and churn, like @jlengstorf said:

new customers may not get that far — they might just contact support and/or churn because “Netlify isn’t stable”

Could we get this work prioritized as part of the #project-build-plugins-ui scope?

Once we know how this is going to work on the backend, it’d be super helpful to get an issue opened in https://github.com/netlify/netlify-react-ui/issues so we can send the UI work through our typical process: design can take these suggestions and formalize a solution, then we can send it into frontend implementation (@drewm will lead that work), Copy Club, etc. 🙂

lesliecdubs on Jan 21, 2020

all makes sense, and I’m on board. A couple wishlist items that I’d really like to see in place to avoid frustration/confusion later on:

1. Make it one-click to remove and rebuild when a plugin fails

messaging makes it clear when plugins cause a failure, and which plugin caused the failure

enable/disable provides a quick way to get your builds going again

this makes sense, and I’ll make a request since it sounds like we’re already building all of this functionality:

could we make the uncaught error message say something like this?

There was an unexpected error in the plugin
`netlify-plugin-dont-touch-my-garbage`. As a
precaution, the build was stopped and nothing
was deployed.

BUTTON: **Disable this plugin and rebuild.**

clicking the button would both turn off the build plugin and restart the deploy

2. Log unexpected failures so we can identity broken plugins/opportunities to coach plugin devs

utils.build.failPlugin allows plugin authors to explicitly signal that the build can continue safely even if their plugin fails unexpectedly - we can also shout about this in the plugin authoring docs to encourage it when possible

can we make sure we’re logging the failures in an analyzable way? this would help us flag and remove plugins that are outright broken, and gives us potential to provide actionable feedback to developers (your plugin has X unexpected failures with the message “Y” — consider using utils.build.failPlugin to avoid these failures)

jlengstorf on Apr 21, 2020

stoked to be talking about this!

I have a lot of different things that I think might be worth discussion, so let me know if we need to break these out into sub-discussions

at a high level, the main issues I see are:

logs are hard to read and full of unhelpful output
failures look the same no matter where they came from
there is no indication in the UI that the error wasn’t Netlify’s fault
there is no way to recover from build plugin errors

logs are hard to read and full of unhelpful output

the logs are made of about 95% junk in terms of useful information — we could leave out almost all of it and I’d have the same idea of what’s going on

because we output so much junk, when a build fails and I go to the log and click the “jump to bottom” arrow, I don’t see the actual error; I see the Build script returned non-zero exit code: 1

someone who is familiar with how Netlify works and has all the context that “oh, no, don’t look at the last error, scroll up and look for the error before that” will figure this out, but new customers may not get that far — they might just contact support and/or churn because “Netlify isn’t stable”

which leads to the next point:

failures look the same no matter where they came from

right now, when you look at the app, a failure just looks like a failure — there’s no way to tell why the thing failed without digging into the logs

this means that a build plugin makes our UI look the same way as a Netlify build error — how would our customers know not to ask support when they have a half dozen failures in a row because some plugin changed in the repo and they weren’t aware?

an idea to improve it

if we use our existing badge markup, we could potentially just add another clarifier to failed builds:

this would require build plugins to capture errors and communicate them out of the build

there is no indication in the UI that the error wasn’t Netlify’s fault

a failed build should add some kind of big blinking marquee at the top of the build log — not just in the log — that something went wrong

right now the visual language at the top of the build log is so similar that I actually thought it was the same output between successful/failed builds until I looked just now

Screen Shot 2020-01-17 at 5 15 05 PM

the build should capture errors — especially if those errors come from build plugins — and expose them in a big-ass red box so people know exactly what went wrong:

there is no way to recover from build plugin errors

this may not be feasible, but in a perfect world we’d be running build plugins in a way that if they failed we could just say, “okay, this build plugin’s busted, don’t run any more of its lifecycles and let’s keep rolling with this build”

I’m happy to be convinced otherwise, but by default I would think we’d want build plugins to fail in a recoverable way, meaning we don’t actually fail a build if the plugin errors out — we’d just expose warnings and ship the site without the build plugin

for plugins like build time speedup plugins, this maintains great DX vs. costing even more time (i.e. I can ignore the failed builds until I have room to breathe and dig in vs. needing to drop everything because the plugin broke and our site won’t build and there’s something critical we need to deploy)

if a plugin should definitely fail the build if it errors out — for compliance checks or perf budgets, for example — they would explicitly opt in to fail builds on error with a setting (ideally we have a utility that they call so they can conditionally full bail, kind of like ESLint warnings vs. errors)

in very simplified pseudo-code, the logic might look like this:

let failedPlugins = [];
let pluginErrors = [];
buildPlugins.forEach(plugin => {
  // bail on future lifecycles if the plugin has already thrown an error
  if (failedPlugins.find(p => p.plugin === plugin.name)) return;

  try {
    plugin[currentLifecycle]()
  } catch (error) {
    if (plugin.config.failBuildOnError) {
      // only kill the build if the plugin _explicitly_ says it should die on failure
      noMrBuildIExpectYouToDie({ cause: plugin.name, error: error.message })
    }

    // otherwise we just keep records and share those after the build succeeds
    failedPlugins.push({ plugin: plugin.name, lifecycle: currentLifeCycle })
    pluginErrors.push({ plugin: plugin.name, error: error.message })
  }
})

this is A Lot™, so let me know if you want to dig into these separately or get on a call to discuss

jlengstorf on Jan 18, 2020

@jlengstorf you ruined a flawlessly executed swoop-and-poop by saying nice things at the end! 😂

Clarifications

My proposal was only addressing the question of builds continuing when plugins unexpectedly fail, the rest of your error communication recommendations have been converted to tickets for implementation.

Here are the ways plugins can fail, and the expected/proposed result:

the plugin itself catches the failure and signals that the build should still proceed
- Build proceeds, and is marked with something akin to your “recoverable build plugin errors” messaging
- Deploy page header has whatever custom error message was passed back by the plugin through utils.build.failPlugin()
the plugin itself catches the failure and signals that the build should not proceed
- Build fails, and is marked with something akin to your “unrecoverable build plugin errors” messaging
- Deploy page header has whatever custom error message was passed back by the plugin through utils.build.failBuild()
Plugin fails, but the plugin does not catch the failure
- Build fails, and is marked with something akin to your “unrecoverable build plugin errors” messaging
- Deploy page header has default messaging for plugin failure

We should probably provide a stack trace in the deploy header as well, but still finalizing details on that.

Remaining issue: should unexpected plugin failures fail the build?

The problem with allowing builds to continue by default is that a plugin can do anything to the cloned repo, including putting it in an unstable state. If we allow builds to continue after an unexpected plugin failure, we don’t know what incomplete actions took place and what the impact on the resulting build will be. We’re planning on wrapping plugins in try/catch as you would expect.

The only thing we seem to differ on is whether to make build failure opt-in or opt-out when a plugin fails (throws an uncaught error) unexpectedly. I’ll distill the specific reasoning for proposing opt-out to help the conversation stay focused:

messaging makes it clear when plugins cause a failure, and which plugin caused the failure
enable/disable provides a quick way to get your builds going again
utils.build.failPlugin allows plugin authors to explicitly signal that the build can continue safely even if their plugin fails unexpectedly - we can also shout about this in the plugin authoring docs to encourage it when possible

There was also plugin author feedback that they’d prefer failure-by-default for unexpected errors, but I don’t recall who said that. @verythorough may know.

erquhart on Apr 21, 2020

I think as a discussion issue, it makes sense to close this now. If there are remaining smaller tasks within the comments that haven’t been addressed, we should file new, focused issues for them.

verythorough on May 25, 2020

100% agreed on both insights, thank you for this.

erquhart on Apr 22, 2020

This issue has a lot of great feedback, we can use it as a parent for plugins errors in the UI. Raw mechanisms on the build side are in place via #735.

erquhart on Mar 30, 2020

However, If these plugins are mission critical to the application, the entire build should likely fail. Example: A netlify CMS plugin that injects the /admin contents & routes.

We would need some sort of mechanism to determine if a failure should exit the build or ignore it. With the wide variety of errors that can occur, this might be tricky (or impossible).

agreed — I buried this toward the end of my original comment, but I think build plugins should have to opt in to kill the build

if a plugin should definitely fail the build if it errors out — for compliance checks or perf budgets, for example — they would explicitly opt in to fail builds on error with a setting (ideally we have a utility that they call so they can conditionally full bail, kind of like ESLint warnings vs. errors)

having an explicit way to fail builds also overcomes the next point:

Some plugins might throw errors on purpose to stop the build

instead of something arbitrary that may or may not be build-related, e.g.

if (somethingWentWrong) {
  throw new Error('oh noes!');
}

we would have a more structured error using whatever falls out of #161, e.g.

if (somethingWentWrong) {
  utils.build.failWithError('oh noes!');
}

this would presumably allow us to capture those errors in a way that’s easier to surface in the UI (or at the very least allow us to make that improvement under the hood of the failWithError util later on with no user-facing API changes)

jlengstorf on Jan 23, 2020

thanks, @ehmicky and @lesliecdubs!

If a plugin fails, there’s a chance that the resulting build would be erroneous. Making the build succeed might lead to deploying to production a Site that’s invalid.

in my mind this needs to be solved — if a build plugin fails, we should roll back and continue the build as if the plugin wasn’t installed. that could be brute-forced by restarting the deploy and removing the plugin config, or (ideally; maybe in the future) treating builds as immutable objects or something so that the build plugin aren’t mutating the build, but rather creating a modified copy that we can throw away if there are errors

Making the build fail makes it visible to the users that something is wrong with some of their plugins. Otherwise they might overlook it.

I would argue that between this and upstream failures requiring developers to drop everything and make code changes because deploys stopped working, overlooking failures is the better outcome

the UI should make it apparent that something has gone wrong, and ideally we’d be able to send out an error notification email and/or digest to make sure people see the errors

jlengstorf on Jan 22, 2020