go: path/filepath: Walk is slow compared to 'find' due to extra Stat calls
On my Mac laptop (SSD, warm caches),
bradfitz@laptop ~$ time (find src/ | wc -l)
42408
real 0m0.177s
user 0m0.035s
sys 0m0.144s
bradfitz@laptop ~$ time (find src/ | wc -l)
42408
real 0m0.178s
user 0m0.036s
sys 0m0.144s
bradfitz@laptop ~$ time (find src/ | wc -l)
42408
real 0m0.177s
user 0m0.035s
sys 0m0.144s
But with a basic use of filepath.Walk instead of find:
$ cat walk.go
package main
import (
"fmt"
"log"
"os"
"path/filepath"
)
func main() {
err := filepath.Walk("src", func(path string, fi os.FileInfo, err error) error {
if err != nil {
return err
}
fmt.Println(path)
return nil
})
if err != nil {
log.Fatal(err)
}
}
It’s much slower:
bradfitz@laptop ~$ time (./walk | wc -l)
42408
real 0m0.447s
user 0m0.123s
sys 0m0.406s
bradfitz@laptop ~$ time (./walk | wc -l)
42408
real 0m0.434s
user 0m0.120s
sys 0m0.399s
bradfitz@laptop ~$ time (./walk | wc -l)
42408
real 0m0.435s
user 0m0.119s
sys 0m0.398s
This is the bulk of the goimports execution time. goimports actually does a slightly parallelized walk with goroutines (which helps on NFS filesystems), but it doesn’t seem to matter. I’m just trying to get any Go program to be closer in performance to find.
Any clues?
Speed tracking bug for goimports is #16367
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Reactions: 6
- Comments: 50 (34 by maintainers)
Commits related to this issue
- cmd/goimports, imports: optimize directory scanning and other things This brings goimports from 160ms to 100ms on my laptop, and under 50ms on my Linux machine. Using cmd/trace, I noticed that filep... — committed to golang/tools by bradfitz 8 years ago
- strings: add special cases for Join of 2 and 3 strings We already had special cases for 0 and 1. Add 2 and 3 for now too. To be removed if the compiler is improved later (#6714). This halves the num... — committed to golang/go by bradfitz 8 years ago
- imports: wait for fastWalk workers to finish before returning In some cases walkFn is being called after the fastWalk function has returned. This often happens when an error was encountered early on ... — committed to golang/tools by jstemmer 7 years ago
- Revert "imports: wait for fastWalk workers to finish before returning" This reverts commit 4436e5475416d77a9352558d118d0b585b962ef1. Reason for revert: Breaks goimports. See: https://github.com/gola... — committed to golang/tools by bradfitz 7 years ago
- imports: wait for fastWalk workers to finish before returning (take 2) This is Joël Stemmer's https://golang.org/cl/40092 again, but with a fix to prevent workers from deadlocking on send if the call... — committed to golang/tools by bradfitz 7 years ago
- Added aggregation logic for decoders duplicating metrics; speeded up cgroup decoder. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> — committed to bwplotka/ebpf_exporter by bwplotka 3 years ago
For people looking for reusing the code into their own projecst, there is now this third-party library that is basically a clone of
fastwalk: https://github.com/karrick/godirwalkI think this issue is fixed by the new
filepath.WalkDirfunction in Go 1.16. It uses the newos.DirEntry(fs.DirEntry) type to avoid a stat for every file. On my machine (with a large source code directory). Summary (and raw results below):So
WalkDirisn’t quite as fast asfind, but significantly closer now, and a lot faster thanWalk.@Tim15, there can’t be any update on this due to the fundamental API design. See comment https://github.com/golang/go/issues/16399#issuecomment-233190667 above.
I’ll label this bug Go 2. Maybe we can change the walk interface to be more efficient later.
@vinceprignano, yes, that’s what I’m doing now.
Actually, I realize now that the
filepath.Walk(at least with our definition ofos.FileInfo) is inherently slow, as its API guarantees you get a completeFileInfowith each call, which requires us to do astatper file, even if the user doesn’t want it.I think I’ll move away from using
filepath.Walkforgoimports.I think this can be closed now that we have the new
WalkDiras mentioned aboveWalk sorts the file entries, but find doesn’t. Not sure how much that affects the profiles though.
@saracen, Size. See https://github.com/golang/go/issues/16399#issuecomment-236624924 above.
@johansglock, thanks, and sorry. I’ve reverted the change in https://go-review.googlesource.com/c/40296/
That’s interesting, I hadn’t considered that. It think this happens when a worker received more input but then gets stuck on sending the result to
w.reschere.Another possible way of dealing with this that doesn’t involve draining could be:
I wonder if there’s a better way of dealing with this.
In my observations there were 2 problems:
I wrote an article (in russian, sorry for that) that describes how to achieve performance that is higher than find(1) has: https://habrahabr.ru/post/281382/ (google translate: https://translate.google.ru/translate?sl=ru&tl=en&js=y&prev=_t&hl=ru&ie=UTF-8&u=https%3A%2F%2Fhabrahabr.ru%2Fpost%2F281382%2F&edit-text=&act=url)