apollo-client: Umbrella issue: Cache invalidation & deletion

Is there a nice and easy way to set a condition to invalidate a cached query against? Think about time-to-live (TTL) or custom conditions.

For example (pseudo-code warning):

query(...).invalidateIf(fiveMinutesHavePassed()) or query(...).invalidateIf(state.user.hasNewMessages)

forceFetch serves its purpose, but I think the cache invalidation condition should be able to live close to the query(cache) itself. This way I don’t need to check manually if there’s a forceFetch required when I rerender a container. The query already knows when it’s outdated.

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 44
  • Comments: 72 (20 by maintainers)

Most upvoted comments

Popping in on this… I’d like to add that I think this is very badly needed. I’m actually surprised it’s not baked into the core of the framework. Without the ability to invalidate parts of the cache, it’s almost impossible to cache a paginated list.

Thinking about this some more: What I’d like to see is something very much like the API for updateQueries:

client.mutate({
    mutation: SomeMutation,
    variables: { some_vars },
    invalidateQueries: {
      queryName: (previousQueryResult, { mutationResult }) => true,
  });
}

The invalidateQueries hook is called with exactly the same arguments as updateQueries, however the return result is a boolean, where true means it should remove the query result from the cache and false means the cache should be unchanged.

This strategy is very flexible, in that it allows the mutation code to decide on a case-by-case basis which queries should be invalidated based on both the previous query result and the result of the mutation.

(Alternatively, you could overload the existing updateQueries to handle this by returning a special sentinel value instead of an updated query result.)

The effect of invalidating a query removes the query results from the cache but does not immediately cause a refetch. Instead, a refetch will occur the next time the query is run.

I’ve read tons of workarounds, but that’s not solving the problem.

You have to cherry-pick a bunch of workarounds (or build them yourself) for basic usage of this library. It’s like it is built to work with the “to-do” app and nothing more complex than that.

We need an official implementation or at least documentation guiding users on how to deal with all of these edge cases (if you can call Cache#remove an edge case).

I’m more than willing to help on anything, but I’m afraid if I just fork and try to implement this, it will be forgotten like this issue or the 38 currently open PRs…

Right now I’m mentally cursing who chose this library for a production system 😕

@jvbianchi we are nearing the launch of a new store API and network stack which should allow for fine grained cache control!

I’d like to delete some specific nodes in my cache store for my application’s logout, for security, and also force them to refresh from the network the next time they are accessed via queries.

Would it make sense to have invalidation and deletion methods symmetrical to the other imperative store API methods?

For example:

client.deleteQuery({
  query,
  variables,
})

client.deleteFragment({
  fragment,
  id,
})

Nodes selected by the above would be deleted from the cache, and would be refreshed from network when accessed by a query with a cache-first, cache-and-network, or network-only fetch policy. A query with a cache-only policy would not find those nodes.

client.invalidateQuery({
  query,
  variables,
})

client.invalidateFragment({
  fragment,
  id,
})

Nodes selected by the above would be marked as stale in the cache, and would be refreshed from network when accessed by a query with a cache-first, cache-and-network, or network-only fetch policy. A query with a cache-only policy would see those stale nodes.

Is cache invalidation (evict) ready to use on 2.0? Any docs on that?

Can anyone explain how this issue will be affected by the 2.0? Will it be significantly easier with 2.0 to refetch/invalidate queries?

Thanks!

I too am pretty disappointed that we still must manually specify queries to update. As @pleunv said, it’s true that

it causes issues when new queries get added and you forget to reference them in the necessary mutations

this is a huge maintenance nightmare and having to keep track of this interconnected web only gets nastier as your application grows/changes.

I’m hopeful for an apollo future where we will not have to worry about this specific problem quite as much 🙏

Would very much appreciate a cache expire time option or a TTL feature.

@swernerx yes, it’s still on the radar! We trying to simplify the core and API before we add more features though, so this won’t be in 1.0, but it could very well be in 1.1!

There is a need to automate garbage collection inside the cache. The cache presents very limited API to the world, mostly allowing reads/writes through queries and fragments. Let’s look at the queries for a moment.

The query has a concept of being active. Once query is activated, and results are fetched, it denormalises the response to the cache. It cannot delete it, because other queries might end up using the same results. What the query can’t do is to reference other queries, so there is no way to make cycles. This make a reference counter based GC viable.

Let’s suppose that the underlying cache object holds a reference counter. Once the result is being written to/materialised from the cache, the query can collect all references objects, hold on them into a private Set and increase the reference counter. Every materialisation process would fully clear and repopulate that `Set, while adjusting refcount accordingly.

To prune specific query data and enable potential garbage collection from cache, you have to adjust refcount for all associated objects and clean that Set.

Once in a while the cache could simply filter out all keys that have refcount 0. That eviction could be easily triggered with few strategies:

  • a query could have a set timer when data becomes stale
  • query could be told programatically to evict the cache
  • query deactivation, potentially combined with additional timer
  • data refetch would automatically prunes all no longer used objects
  • etc. (any ideas?)

The readFragment would have to be further constrained that it may fail if there is no query that holds on requested object. Simply because the data might have been evicted from the cache already.

The remaining issue is with writeFragment when there is no backing query to hold on it’s value, as it gives no guarantees that the data will actually be persisted for any length of time. I’m not sure if there is any use-case other than activating a query immediately after some fragments were written, and we can easily make that scenario work.

@TSMMark exactly

maintenance nightmare

We have lists queries which can be paginated, sorted and filtered by more than 1 param. In order to do optimistic updates over list item we need to keep track of set of variables which correspond to every single list where this item appears, like all combinations of sorting, pagination and filtering params (!).

What are we doing wrong?

I’m feeling the need to have a field based invalidation strategy. As all resources are available in a normalized form under state.apollo.data (based on dataId), and going further with the proposal from @viridia, I believe we could reach a field based invalidation system much like the following:

Given this schema:

type Person {
  id: Id!
  name: String!
  relative: Person
}

type Query {
  people: [Person]
}

type Mutation {
  setRelative (person: String!, relative: String!): Person
}

With a data id resolver such as:

const dataIdFromObject = ({ __typename, id }) => __typename + id

With Person type resources with ids 1, 2, and 3 being found at the apollo store as:

Given person 2 is currently the relative person to person 1

{
  apollo: {
    ...
    data: {
      Person1: {
        id: 1,
        name: 'Lucas',
        relative: {
          type: "id"
          id: "Person2"
          generated: false
        }
      },
      Person2: {
        id: 2,
        name: 'John',
        relative: null
      },
      Person3: {
        id: 3,
        name: 'John',
        relative: null
      },
    }
  }
}

And having a query somewhere in system such as:

query Everyone {
  people {
    id
    name
    relative {
      id
      name
    }
  }
}

I could then perform a mutation to change the relative of person 1 to be person 3 and force invalidation as following::

client.mutate({
  mutation: setRelative,
  variables: { person: 1, relative: 3 },
  invalidateFields: (previousQueryResult, { mutationResult }) => {
    return {
      'Person1': {
        'relative': true
      }
    }
  }
)}

Edit: I do understand that updateQueries is a valid method for this use case, but updateQueries is only fit on mutations which result will bring all the data you need to update the queries, which is almost never true.

I think that we “just” need a way to update the cache by __typename:id instead of a specific query + variables 😉 So even if you have an infinite variables cases (for example a filter param) it’s no more a problem.

Something like this:

client.readCache({key: 'Post:1'}); // {data: {...}}
client.writeCache({
  key: 'Post:1',
  data: {
    title: 'oh yeah',
  },
});
client.deleteCache({key: 'Post1'});

Note that with this solution, you can also update every results cached (and this is exactly what I wanted!)

@helfer Would adding a timestamp to each cache result break something in the library?

If a client were able to grab the timestamp of a query result, they could use refetchPolicy and decide on an application basis whether data was deemed to be stale.

For example, if an app regards data older than n seconds to be out of date, you could fetch @ cache-only, check the timestamp and then issue a fetch @ network-only.

That way the semantics of cache invalidation can be pushed up to the app.

I have a very simple scenario:

  • list some data (with pagination)
  • add some data
  • invalidate the list so the list component will query the network again

I’ve read several issues around here but still can’t find a good way of doing this except manually deleting the data from the store on the update method for a mutation.

Since cache is stored with variables, I don’t know which list the data should be added to, so it’s just better to invalidate all loaded lists from that field. However there doesn’t seem to be an API for this.

I can use writeQuery to ADD data, but how do I remove a whole field from the cache? This issue is from 2016 and we still don’t have an API to remove things from the cache… what can I do to change that?

Ok, I’ve worked on this issue and ended up with a project to solve field based cache invalidation: apollo-cache-invalidation.

This is how you would use it, following the example on my previous comment:

import { invalidateFields } from 'apollo-cache-invalidation'

client.mutate({
  mutation: setRelative,
  variables: { person: 1, relative: 3 },
  update: invalidateFields(() => [
    ['Person1', 'relative']
  ])
)}

As you can see, invalidateFields method is only a higher-order function to create a valid update option for the client.mutate method. It receives a function, which will be called with the same arguments update does. It must return not a structured object, such as intended in my previous comment, for the keys can be dynamic - it accepts strings, regex, or functions for each key in a field path.

Further documentation can be found in the project’s page.

Keep in mind this can be used for dynamic cache-key invalidation at any level, so to invalidate the relative field on all person one could simply add an invalidating path as such:

invalidateFields(() => [[/^Person/, 'relative']])

If you people find this useful, care to provide some feedback.

I’ve spent a whole day trying to figure out how to delete something from my cache/store. Is there a solution for this? I have finished 90% of my app with Apollo and this hit me right in the face. There really is no way to delete something?

Wouldn’t changing

proxy.writeData({ id: `MyType:${id}`, data: null });

to delete the object instead of having no effect be sufficient here? For my case at least, it would be a very elegant, easy and intuitive solution.

I think the best way to solve delete & cache issue is to add GraphQL directive:

mutation DeleteAsset($input: DeleteAssetInput!) {
  deleteAsset(input: $input) {
    id @delete
    __typename # Asset
  }
}

Execution of this query should delete Asset:{id} from cache.

Also wondering if there has been any change on this subject after 2.0. Starting to implement some optimistic updates here and there and currently not sure how to deal with data removal. I would prefer my mutations not to have knowledge of which queries need to be updated (sometimes it involves multiple, and they can be rather complex + it causes issues when new queries get added and you forget to reference them in the necessary mutations) so I would like to avoid the withQuery route where possible and instead work directly on fragments. Is there any possibility (or plans) to directly remove a specific fragment from the cache?

I don’t think this one should be closed yet 🙂

Let me try making a proposal for a solution - if the maintainers are OK with it, I or someone else could work on a PR to implement it in apollo-cache-inmemory.

Originally, I wanted to start with the existing evict function, but I don’t think it’ll work without breaking changes, so I may as well call it something different.

Let’s call it deleteQuery and deleteFragment, to mirror the existing read/writeQuery/Fragment functions. I’ll just start with deleteQuery and assume deleteFragment works mostly the same way:

public deleteQuery<TVariables = any>(
  options: DataProxy.Query<TVariables>,
): { success: bool }

You could use it like this, after adding a widget, for example:

const CACHE_CLEAR_QUERY = gql`
  query ClearWidgets($gizmo: ID!, $page: Int!, $limit: Int, $search: String) {
    gizmoById(id: $gizmo) {
      widgets(page: $page, search: $search, limit: $limit)
    }
  }
`;

proxy.deleteQuery(CACHE_CLEAR_QUERY, {
  variables: {
    page: () => true, // clear all pages
    // only clear search queries that could match the widget we just added
    search: value => !value || newWidget.name.indexOf(value) !== -1,
    gizmo: newWidget.gizmo.id,
  },
});

A couple of important notes here:

  • This is not a complete GraphQL query; widgets returns a WidgetList, but we haven’t provided any subfields. This tells Apollo to wipe out the entire entry of Gizmo1.widgets rather than just a specific subfield
  • variable values can now be functions of type (input: any) => boolean). (this only works for deleteQuery/deleteFragment, of course) Best way to walk through how this works is an example - if Apollo goes into the cache sees a value cached as Gizmo1.widgets({"page":0,"search": "hello"}), it will call the functions with page(0) and search("hello"). variables can also provided like normal literals; gizmo: "15" is equivalent to gizmo: value => value === "15". If all variables match the field in the cache, the field will be matched and removed.

After items have been removed from the cache in this way, any currently active queries that are displaying this data will automatically and immediately refetch.

The part of this I’m least certain about is the ability for a query to be “incomplete” and not specify subfields - some feature needs to exist so that you can clear an entire array instead of, say, just the id and name fields of every entry in the array, but this particular solution would break a lot of tooling that tries to parse GraphQL queries against the schema.

I’m using this workaround, but it is pretty bad because I need to refetch all my queries again because my client store is empty but at least doing this my App is always consistent.

export const mutateAndResetStore = async (client, fn) => {
  await fn();
  // TODO fixing problem with cache, but we need to have a better way
  client.resetStore();
};

mutateAndResetStore(client, () =>
    // my mutation call
    saveGroup({
...
...

we need a real solution ASAP.

The function below invalidates the cache for a given query by deleting all instances from the store. For example, if there was a function called widgetById that accepted an integer id parameter, then the following command could be executed to clear the cache of all related queries: this.deleteStoreQuery('widgetById');

deleteStoreQuery = (name) => {
  let rootQuery = this.props.client.store.getState().apollo.data.ROOT_QUERY;
  Object.keys(rootQuery).filter(query => query.indexOf(name) === 0).forEach((query) => {
    delete rootQuery[query];
  });
}

I’d really like to see this functionality built in as well.

@dallonf you could experiment using the more sophisticated mutation option update. It gives you direct access to the cache, so you can accomplish quite everything you would need.

Sorry to promote my project here again, but this is exactly the kind of situation I built it for: apollo-cache-invalidate. Basically, following the schema you presented, you could invalidate all your paginated results (because they truly are invalid now) at once with:

import { invalidateFields, ROOT } from 'apollo-cache-invalidation'
import gql from 'graphql-tag'

import { client } from './client' // Apollo Client instance.

const mutation = gql`
  mutation NewWidget($name: String!) {
    createWidget(name: $name) {
      id
    }
  }
`

const update = invalidateFields((proxy, result) => [
  [ROOT, /^widgets.+/]
])

client.mutate({ mutation, update, variables: { name: 'New widget name' } })

But - and here goes a big but - this currently would invalidate cache only for non instantiate queries, meaning if the widgets query is currently present in the page, it would work. I have a pull-request working on this issue.

Hope it helps.

I like the idea of differentiating between invalidate and delete.

After using https://github.com/lucasconstantino/apollo-cache-invalidation, though, I’m not convinced that an API perfectly parallel to writeQuery/writeFragment is sufficient, since it only targets fields with one particular set of arguments… here’s an example of why that’s important.

Let’s say I have a widgets(page: Int = 0): [Widget] field in my root Query type. When I query this, I’ll get ROOT_QUERY.widgets({"page": 0}) added to the cache, as well as for "page": 1 and page 2 and so on.

Now let’s a say a mutation comes along and adds or deletes a Widget somewhere in the middle of that list. There’s no sane way to simulate that client-side, so I need to invalidate the entire widgets field so it can be re-fetched. With an invalidateQuery API, the best I could do is invalidate one page of it, which would leave the cache in an inconsistent state.

I’m not sure, though, that regexes (as used in apollo-cache-invalidation) are the right approach either. Ideally I’d be able to pass a function that takes in a field’s args and returns whether it should be removed from the cache? I have no idea what that might look like, though.

I’m not sure how much this really adds to the conversation, but I spent a whole lot of time typing this out in a dupe issue, so I may as well add it here. 😃 Here is a use case my team commonly runs into that is not well covered by the existing cache control methods and would greatly benefit from field-based cache invalidation:

Let’s say I have a paginated list field in my schema:

type Query {
  widgets($limit: Int = 15, $offset: Int = 0): WidgetList
}

type Widget {
  id: ID!
  name: String
}

type WidgetList {
  totalCount: Int!
  count: Int!
  limit: Int!
  offset: Int!
  data: [Widget]!
}

There is table in the app powered by this Query.widgets field. The user can customize the page size (aka $limit) as well as paginate through the list ($offset), so there are an essentially unbounded number of possible permutations for this query. Let’s also say for the sake of argument that the sorting logic of this field is complex and cannot be simulated client-side. (but even in simple sorting cases, I’m not convinced it’s reasonable to do this client side. Pagination really is a difficult caching problem.)

So let’s throw a simple mutation into this mix…

type Mutations {
  createWidget($name: String!) : Widget
}

When I fire this mutation, there is really no telling where it will be inserted into the list, given the aforementioned complex sorting logic. The only logical update I can make to the state of the store is to flag either the entire field of widgets as invalid and needing a refetch, or to invalidate every instance of the query, regardless of what its variables are.

Unless I’m missing something, there doesn’t seem to be any way to handle this case in Apollo Client. refetchQueries, as well as the the new imperative store manipulation functions, require you to specify variables, and updateQueries of course only works on active queries, and even in react-apollo where all queries are kept active, only one instance of the query (that is, with one set of variables) will be active at a time.

@helfer I’ve renamed the project to apollo-cache-invalidation. I’ll look at you other considerations in the morning 😉

Thanks for the productive feedback!

@lucasconstantino An additional caveat of updateQueries that’s worth ~harping on~ pointing out is that it only works with queries that are currently active; it won’t help you if the affected data will be fetched by a future query.

This really needs to get resolved. It’s not even an edge case or rare use scenario. Every single CRUD app will need to deal with this issue.

Why is it so hard to write a deleteFragment function and broadcast updates to all queries subscribed to it? I have a chat app that a user wants to delete a single message. Sounds like a really common scenario. I don’t want to refetch all messages nor update the query. I just want to find the message fragment by id and delete it.

Here’s my my work around, I would love something like this built in of course done better where I dont need to set manual IDs: https://gist.github.com/riccoski/224bfaa911fb8854bb19e0a609748e34

The function stores in the cache a reference IDs along with a timestamp then checks against it which determines the fetch policy

I’ve talked with @stubailo about being able to write undefined to invalidate queries/data - he says it should work but it doesn’t so this might be something that can be added as a good interm solution

@yopcop Sure, my solution only works for update and delete a part of a query, but it’s better than the actual solution (and I’m also aware that is a hard problem and no easy solution exists). Sometime is definitively easier to invalidate previous queries.

For this kind of complex queries, I think that a powerfull solution can be to be able to query the queries

Example to illustrate:

// my apollo store
posts(fromDate: "2018-01-01", toDate: "2018-03-31")
posts(fromDate: "2018-01-01", toDate: "2018-04-31")
posts(fromDate: "2018-01-01", toDate: "2018-05-31")

// how to update this
client.readQueries({
  query: gql`query Post($from: Date!, $to: Date!) posts(fromDate: $from, toDate: $to) { ... }`,
  variables: (from, to) => isAfter(from, "2018-01-01") && isBefore(to, "2018-04-31")
});

So basicly like a filter and it’s return array of queries so can you .map and use the classic client.writeQuery().

(after I never put my hands on the Apollo code base, so I really don’t know if it’s possible, it’s just to share ideas 😉)

@fabien0102 With this solution, I don’t think you can add a result to a query though. If I made a query like posts(fromDate: "2018-01-01", toDate: "2018-03-31"), and I create a new post with date="2018-03-20", I would like to invalidate the query. I can add it manually to the query results, but if the filters get complicated, it can be a lot of work. Invalidating the query would be much easier if I don’t mind the extra requests made to refresh them.

@anton-kabysh interesting. I think this might create confusion on what exactly is being removed, though; the cached information, or the entity itself.

@dr-nafanya I think there is a method available for this: client.cache.reset()

@Draiken the function in my comment above deletes the whole field from the cache regardless of variables. I agree it’s frustrating that there still isn’t a proper solution.

I have same problem, any date to launch new store API?

@lucasconstantino Aha, thanks! That does solve my use case for now. (I had tried your module, but didn’t realize the regex was necessary to capture field arguments).

@nosovsh apollo-cache-invalidate will basically purge current data from the cache. In the current state of things, it will work as expected for new queries the user eventually perform on that removed data, but if there are any active observable queries (queries currently watching for updates) related to the removed data, these queries will not refetch, but serve the same old data they had prior to the cache clearing. To solve this problem, I’m working on a pull-request to apollo-client to allow the user decide when new data should be refetched in case something goes stale: https://github.com/apollographql/apollo-client/pull/1461. It is still a work in progress, and I’m not sure hold long it will take for something similar to go into core.

@helfer I’ve looked into your second observation, and I think I found a dead end here.

Studying the QueryManager class, and specifically the queryListenerForObserver method, I’ve realized the way I’m doing the cache cleaning isn’t really work for current observers. This is quite odd, for I though I had it working on a project using react-apollo. I’ll look into that later, though. About the stale data being returned, I don’t really understand why when a stale data is found isn’t a refetch triggered. In which scenarios could some data be missing, but having it had been fulfilled before, and how come the user would want that old data and not a fresh one in that case? I’m talking about line 412-418 on the QueryManager, to contextualize you.

Testing it locally, I was able to fire a refetch from that exact spot, using storedQuery.observableQuery.refetch(), which did solve the problem for apollo-cache-invalidation approach.

The big problem here, I think, is that the field based invalidation isn’t really current compatible with the approaches Apollo Client has on cache clearing or altering. Both refetchQueries and updateQueries rely on the code doing the mutation to know exactly which queries (and even variables) are to be updated, meaning the code needs to know quite a lot to perform cache clearing. The idea behind a field-based invalidation system is to make this mutation code aware of the structure of the store, but absent of other code performing queries on that same store. I would like to make all ObservableQueries understand that some of their data is now invalid, but I don’t see how when only having a path in the store - nothing really related to the queries used to build the observables. Basically, I’m fighting the current design when trying to walk ahead on this approach.

Well, as far as I could look it up, apollo-cache-invalidation cannot on it’s own fix the stale data problem, meaning I would have to pull-request Apollo Client at least on some minor routines. What do you think? Should I proceed, or am I missing something here?

By the way: I guess being informed that refetchQueries now does trigger refetches on non-active queries (supposedly) makes apollo-cache-invalidation project a bit more specific to some use cases.