boardgame.io: Improve client error handling

This is a proposal for how we could improve error handling in boardgame.io clients.

Related: #592 #218 #828

Current Situation

Errors happen implicitly and there is no way for a client to handle them or even know they happened.

Errors are logged in two key places:

  1. The reducer logs errors and returns the original state when it encounters them.

  2. When in use, the game master does various validation and logs errors too, bailing when it encounters them.

A lot of this validation logic is actually duplicated across master and reducer, which may not be necessary.

Goals

  • A client should know if an error happened and why.

  • Errors should be handled in local single-player instances as well as server-connected multiplayer scenarios.

  • If possible, the action that triggered an error should be able to receive the error at its call site.

Core Proposal

  1. If the reducer encounters an error, instead of returning the unchanged state it should return { ...state, error }.

  2. The error field should never be persisted — it is a transient side effect of the most recent action specific to a single client.

  3. When a master is being used, if there is an error, the update should not complete by sending an update event with the full state to all clients. Instead, the master should send a new error event type to the requesting client. (error would be a third event type alongside update and sync.)

  4. The error field would have a shape of { type, payload? }, where type is one of a range of boardgame.io error codes (see “Error Types” below) and payload is additional information where relevant.

  5. If possible, client action dispatchers should return a promise that is resolved by the client once the action has been processed. If the action produces an error, the dispatcher would resolve with { error }. See “Errors at call sites” below for more details.

Notes

Error Types

A catalogue of potential error types based on current error logging, with suggested error codes.
type description
update/unauthorized_action The action’s credentials were missing or invalid
update/match_not_found The action’s matchID was not found
update/patch_failed Could not apply patch update
action/stale_state_id The action contained a stale state ID
action/unavailable_move The requested move is unknown or not currently available
action/invalid_move The move declared it was invalid (INVALID_MOVE constant)
action/inactive_player The player making the action is not currently active
action/gameover The game has finished
action/action_disabled The requested action is disabled (e.g. undo/redo, events)

Errors at call sites

Could an action dispatcher offer an async response at the call site?

It would be nice if dispatchers like client.events.endTurn() or client.moves.someMove() could be awaited so that the call site can do something with the result.

Some more feedback would be welcome here, but would it be possible for the action dispatchers to register callbacks? Internally a dispatcher could do something like:

function dispatcher() {
  // ...
  store.dispatch(/* ... */);

  return new Promise(resolve => {
    client.registerActionCallback(resolve);
  });
}

Then when the state updates, the client could check if it was successful or if there was an error, call the callback and remove it.

At the call site users could then do:

client.moves.someMove()
  .then(({ error }) => {
    if (!error) return;
    // handle error
  });

There is something unidiomatic about getting an error in then instead of in catch, but if we avoid throwing errors, users can continue calling dispatchers without worrying about the fact it’s async at all.

What would be the mechanism for identifying which state updates relate to which action callback? Perhaps we need to store _stateID or some other data alongside the Promise resolving callback. Same goes for a timestamp if timeout logic comes into play.

There’s an unanswered question here about optimistic updates. Should the promise resolve on a successful optimistic update or must it wait for server response in a multiplayer set-up? This is tricky, because optimistic updates will keep things responsive, but we wouldn’t then be able to resolve with network errors or an error the remote master produced.

Invalid moves

Supporting arbitrary error data for invalid moves

There has already been some discussion of letting moves report more data about why they were invalid (see #592). I would suggest the following approach:

  • Export an InvalidMove factory to be used instead of the INVALID_MOVE constant. (We could keep support for the constant for now but deprecate it, only documenting the function.)

    InvalidMove would return some predictable object like { __error: INVALID_MOVE, payload }.

  • Users could pass arbitrary data to InvalidMove and receive that in the reducer’s error payload:

    import { InvalidMove } from 'boardgame.io/core';
    const move = (G, ctx) => InvalidMove({ reason: 'no hippos' });
    
  • In the reducer, instead of checking G === INVALID_MOVE, we can add a check for the new data structure.

  • The above example would produce an error in state like:

    {
      error: {
        type: 'action/invalid_move',
        payload: { reason: 'no hippos' },
      }
    }
    

Network errors

Connectivity issues aren’t particularly well managed at the moment because we don’t know for sure when an action times out, errors etc. With the above, the client could implement some timeout/retry/error handling logic.

If an action triggers neither an update nor an error event after some interval, the client can assume the network hung. It could then roll back optimistic updates or retry.

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 5
  • Comments: 15 (14 by maintainers)

Commits related to this issue

Most upvoted comments

Gah. I had some health problems come up and totally dropped the ball on this for like 2 years. I’ll try to get caught up and see if any of this still applies/is useful but no promises on timeline.

I’m looking forward to this feature 😄

Sounds great to me. The intermediate plumbing layer is unaffected by this discussion.

Oh, I’m not saying there should be no call site API. I totally understand that for certain implementations it would be preferable to have that way to associate action with error. I just also think there should be a way for code that didn’t dispatch an action to find out there was an error without the need for additional user-implemented infrastructure such as global error contexts or a message bus, which each call site needs to individually remember to tap into. For a hypothetical ErrorMessage component in React or plain JS there is no difference whether it gets its value from the boardgame.io client or from some user-land infrastructure — an error just shows up — but for the developer the difference is they have to build that infrastructure and I think we have a pretty clear path to providing a simple API they can use when they don’t need the fine-grained control of your examples.

Writing a component that consumes a declarative error object and displays it when it’s defined is a pretty simple concept to grasp, especially for less advanced users. It’s important to remember that not everyone is comfortable wiring up their own carefully calibrated infrastructure that filters and funnels errors — or that people may want a quick and easy way to access error for debugging purposes before building a system to decide what to display, what to log, what to ignore.

My proposal was just to tackle the “simple” API first as a way of getting all the plumbing in place and then add the call site API, but if you want to try this in some other order or all at once, that’s also fine by me 🙂

Thanks for putting this together @shaoster. I think we’re missing a step here before we tackle the move/event call sites.

The fundamental boardgame.io model currently looks something like this (starting bottom left with the move/event action):

Diagram showing the flow of an action processed by client and server.

When a client calls a move/event, that action is created and the client starts processing it. In the simplest case (no local or socket.io multiplayer) this completes synchronously and the client notifies all subscribers by calling their callback with the new value of state. In this synchronous model, returning a result from a dispatcher might be plausible.

However, when multiplayer is in play, things get more complicated:

  • A game can set client: false on a move to indicate that the move must run on the server (usually because it relies on secret state).
  • A plugin can declare that it must be run on the server. For example, if a move uses the randomness API, that will force it to run on the server (also because PRNG state is secret).
  • Some events also run on the server exclusively.

So we actually have two update paths (solid blue and dashed red arrows in the diagram):

  1. The action is allowed to run locally, so the local reducer completes and notifies subscribers synchronously with the updated state. At some point in the future the server update will also arrive, but will be ignored by the client that updated optimistically.
  2. The action is not allowed to run locally, so the local reducer does not update state, calling subscribers synchronously with the unchanged state. When the server update is received, the client will call subscribers again with the now updated state.

The implication is that any call site API must be asynchronous (I’d recommend by returning a promise), because we can’t know ahead of time if we can definitely complete the full update cycle synchronously. For case 1, the optimistic update or error could be used to immediately resolve call-site promises. For case 2, the call-site promises should be resolved only once the server update/error is received.

I would propose before working on the call-site API, we should first focus on the existing basic API, which will in any case be a pre-requisite for call-site promises. It will also get the infrastructure set up needed for transmitting the error from the server to the client. I propose updating the client subscription callbacks so they receive (state, error) instead of just (state), so a plain JS user of the client can do something like:

// state should be defined even when we provide error to preserve backwards compatibility
client.subscribe((state, error) => {
  if (error) console.error(error.type);
});

client.moves.invalidMove();
// logs 'action/invalid_move'

The React client abstracts the plain JS client by spreading state into a board component’s props. I guess we would add the error there directly, so a board might look like:

const Board = ({ moves, error }) => {
  return error
    ? <p>Error: { error.type }</p>
    : <button onClick={() => moves.invalidMove()}>Move</button>;
};
// Clicking the move button causes the paragraph to read “Error: action/invalid_move”

I’m also open to a more descriptive name, like actionError, which is more important in React because it’s a named prop rather than just a function parameter.

According to this model the parts of closing this issue are:

  1. Raise error transients from the reducer (done in #940).
  2. Wire up server, client, and reducer to pass errors to subscribers.
  3. Emit errors from the server for the errors the master raises (not just reducer errors)
  4. Set up a call-site API to allow for action dispatchers to return promises.

What do you think?

I’ll take a stab at this.