pants: Prototype switching to the `notify` crate for file invalidation

Watchman no longer gives us useful events when the children of directories are created/deleted, which is a large and surprising change in behaviour that means we need to over-invalidate (or add additional syscalls to differentiate creates/updates/deletes).

We should explore whether using something like the notify crate would result in fewer assumptions… we already know that it would result in one fewer daemon and less memory usage (as we have no need to track anything that isn’t warm in the graph).

The primary uncertainty is whether consuming a queue of events in process would require filtering to decide which events to ignore, and which require additional syscalls to determine their actual meaning: for example:

could we easily filter out accesstime updates?
could we watch modification/creation time of directories (which sidesteps the need to invalidate parents based on children)?

Also, even if this looks like a good approach, it should likely be enabled behind a pants global option, so that both watchman and the notify crate can be used at once (EDIT at @illicitonion’s suggestion) for a while.

About this issue

Original URL
State: closed
Created 7 years ago
Comments: 17 (17 by maintainers)

Commits related to this issue

Invalidate parent directories (#5000) ### Problem Watchman `4.9.0` does not send events for a parent when a child is created or deleted, which causes issues like #4662. ### Solution Revert t... — committed to pantsbuild/pants by deleted user 7 years ago
Disable watchman by default. (#9714) ### Problem Using the notify crate for file invalidation is now feature complete, and so we are using two file watching implementations simultaneously under `p... — committed to pantsbuild/pants by deleted user 4 years ago

Most upvoted comments

I kicked off a small branch for this over the holiday: https://github.com/pantsbuild/pants/compare/master...twitter:stuhood/notify-crate … it does not yet work, but it compiles.

It has a number of important TODOs before it is landable:

The InvalidationWatcher::watch method is blocking, and (likely) makes syscalls in order to accomplish what it is doing. We call it in the branch from an async context here: https://github.com/pantsbuild/pants/compare/master...twitter:stuhood/notify-crate#diff-11db197014e1e80c9551e9b61aba6aa0R1224 … that is almost certainly making things slower than they should be. We should probably give the InvalidationWatcher a reference to the Core’s Executor in here (by just cloning it: it is internally reference counted), and then having the fn watch method return an impl Future that we would consume in Node::run instead.
It’s not clear that we’re actually receiving all of the watch events that we’re expecting: should continue to add logging (and confirm that logging is actually working from the background thread that the InvalidationWatcher starts). Should run with something like MODE=debug ./pants --enable-pantsd list 3rdparty:: and watch .pants.d/pantsd/pantsd.log to see invalidation log messages: the goal would be to receive similar log messages for this codepath as we do in the existing watchman invalidation path here.

The first thing is purely a performance improvement, but I’d suggest looking at it first because it will make debugging the second thing easier not to have to wait long periods for pantsd to start up.

Fixing those two things would leave us in a place where we are using two systems to invalidate, but hopefully in a way that has no noticeable impact for end users.

Then, in a second PR, we should allow for watchman to be disabled, most likely by making edits to the python-side FSEventService

We need to add code there that would be able to cleanly shutdown if file watching failed, likely by polling the InvalidationWatcher::running method.
We should add an option to allow for disabling watchman, and skip launching it in/around the FSEventService codepaths.

stuhood on Dec 12, 2019

that does indeed sound difficult and error-prone to implement

Error prone yes, just because watchman is already error-prone, and we would have to maintain more code. Difficult, not really, currently in #9318 the default is to use both simultaneously. They both hook in to the same code path in the engine, but we get redundant invalidation.

hrfuller on Mar 24, 2020