go: os: spurious TestDirFS failures due to directory mtime skew on Windows

https://storage.googleapis.com/go-build-log/f31194a9/windows-386-2008_061172ea.log:

--- FAIL: TestDirFS (0.35s)
    os_test.go:2690: TestFS found errors:
        testdata/simple: mismatch:
        	entry.Info() = simple IsDir=true Mode=drwxrwxrwx Size=0 ModTime=2020-11-16 02:32:02.0111083 +0000 GMT
        	file.Stat() = simple IsDir=true Mode=drwxrwxrwx Size=0 ModTime=2020-11-16 02:32:02.012085 +0000 GMT
        
FAIL
FAIL	os	2.635s

Marking as release-blocker for Go 1.16 because DirFS is a new API.

CC @rsc @robpike @alexbrainman @networkimprov @zx2c4

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 2
  • Comments: 50 (36 by maintainers)

Commits related to this issue

Most upvoted comments

As far as I can tell, this is not happening on the builders at all. I can’t find even one occurrence. It is only happening on the trybots. Why would that be?

It looks like there are two paths in the builder (CC @dmitshur to confirm/deny) – one that writes a “snapshot” and one that writes the actual Go source. It looks like these might be coming from very different places. In that case, I wonder if there’s an effect on this snippet of the untarring code:

                        if modTime.After(t0) {
                                // Clamp modtimes at system time. See
                                // golang.org/issue/19062 when clock on
                                // buildlet was behind the gitmirror server
                                // doing the git-archive.
                                modTime = t0
                        }
                        if !modTime.IsZero() {
                                if err := os.Chtimes(abs, modTime, modTime); err != nil && !loggedChtimesError {
                                        // benign error. Gerrit doesn't even set the
                                        // modtime in these, and we don't end up relying
                                        // on it anywhere (the gomote push command relies
                                        // on digests only), so this is a little pointless
                                        // for now.
                                        log.Printf("error changing modtime: %v (further Chtimes errors suppressed)", err)
                                        loggedChtimesError = true // once is enough
                                }
                        }

Maybe the mtime is 0 for the tarballs sent by the trybots, but not the other ones, and os.Chtimes is helping sync back the timestamp to the MFT a bit sooner.

Change https://golang.org/cl/285720 mentions this issue: os: force mock mtime before running fstest on directory on Windows

Alright, here’s an analogous example reproducer for directories:

#include <windows.h>
#include <stdio.h>

int main(int argc, char *argv[])
{
	WIN32_FIND_DATAA find_data;
	BY_HANDLE_FILE_INFORMATION file_info;
	DWORD written;
	HANDLE file, dir;

	for (int i = 0;; ++i) {
		DeleteFileA("dir\\a.txt");
		RemoveDirectoryA("dir");
		CreateDirectoryA("dir", NULL);

		file = CreateFileA("dir\\a.txt", GENERIC_WRITE | GENERIC_READ, FILE_SHARE_WRITE | FILE_SHARE_READ, NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
		WriteFile(file, "hello go!\r\n", 11, &written, NULL); // Updates the mtime of parent directory
		CloseHandle(file);

		dir = CreateFileA("dir", GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_FLAG_BACKUP_SEMANTICS, NULL);
		GetFileInformationByHandle(dir, &file_info); // Reads up-to-date mtime directly from file metadata.

		FindClose(FindFirstFileA("dir", &find_data)); // Sometimes returns old cached mtime from the MFT.

		CloseHandle(dir); // Closing the handle initiates a flush of the mtime to MGT. We put this _after_ FindFirstFileA to trigger the bug.
	
		if (memcmp(&file_info.ftLastWriteTime, &find_data.ftLastWriteTime, sizeof(file_info.ftLastWriteTime))) {
			printf("%d - %llu != %llu, diff = %llu\n", i, *(unsigned long long *)&file_info.ftLastWriteTime, *(unsigned long long *)&find_data.ftLastWriteTime, *(unsigned long long *)&file_info.ftLastWriteTime - *(unsigned long long *)&find_data.ftLastWriteTime);
			exit(1);
		}
	}

	return 0;
}

And here’s the bug being triggered:

trigger

@egonelbre, ExampleWriteFile has no // Output comment, so it should not actually be executed — search for “Example functions without output comments” in https://tip.golang.org/pkg/testing/#hdr-Examples.

(If it were actually run, that would need to be fixed separately; see #28387 and #30316.)

Change https://golang.org/cl/284142 mentions this issue: os: call runtime.GC() before using FindFirstFile on Windows

This didn’t work at all. It looks like nothing else writes to those files anyway, in order to cause the problem.

I wonder if git is being shim’d by Defender that then keeps the file handle open somehow after the process has terminated?

Oddly, the syscall.win32finddata1 struct being passed to the FindNextFileW system call seems to be three fields short: it is missing the dwFileType, dwCreatorType, and wFinderFlags fields from the WIN32_FIND_DATAW structure.

Is that a secondary bug?

Looking at the implementation of the test, it’s either a problem with the test itself, with the new testing/fstest package, or both. Right now, I’m leaning toward this being a problem with os.TestDirFS proper.

It appears that fstest.TestFS assumes that no file or directory in the given fs.FS is changed in any way while the test is running, but then os.TestDirFS is running TestFS on the directory containing the os/signal package. That causes the test for the os package to fail if there are any concurrent changes in the os/signal package.

I don’t know why the directory’s ModTime was being changed at all in the CL at which this failure was observed; that may be a peripherally-related bug in go run.

However, I could quite easily imagine someone running go test os while concurrently editing an otherwise-unrelated source file in os/signal (such as a test function for that package), and in that situation I could easily envision os.TestDirFS spuriously failing.

If os.TestDirFS requires the property that the directory it is traversing does not change at all while the test is running, it should either have its own isolated subdirectory of GOROOT/src/os/testdata, or it should construct its own temporary directory for the duration of the test and run fstest.TestFS on that — not on the signal subdirectory.