fd: Discussion: show Git-ignored files by default?

Since fd was first published, the feature to hide Git-ignored files by default has always been controversial. It’s the number one pitfall for new users, as witnessed by the numerous issues that have been opened over time (even though this is the first point in the Troubleshooting section). Even experienced users will likely run into this from time to time.

We have had past discussions about this (see #179, #220, #18), but I’m not so sure anymore if this default is the best possible option for the “average user”.

I thought it might make sense to discuss this again and see what others think. Whatever we choose as the default, it will always be easy for users to select a different default via an alias.

Pro current behavior (do not show .gitignored entries by default):

  • Most searches are faster if we take .gitignore files into account. .gitignored directories tend to contain huge amounts of automatically generated build artifacts or downloaded dependency files. Pruning these directories from the search tree typically results in a faster search overall. There are counterexamples to this where the parsing of long .gitignore files takes longer than actually traversing these directories.
  • Most of the time, .gitignored results are not “interesting” to the user (however, see counterpart below).
  • When running fd without any arguments, I typically don’t want to see .gitignored files.

Cons:

  • It can be very confusing to (new) users. If 10% of users go so far as to create a ticket on GitHub to ask about their problem, there must be hundreds of users that ran into this problem at some point.
  • Even if you know about the default, it can be annoying to repeat the search because you forgot to add -I or -u. There are a lot of valid use cases where users are - in fact - interested in results from ignored directories or files. Personally, I would estimate that I use -uu or -HI in roughly 20% of my searches, which is quite high.

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 25
  • Comments: 63 (5 by maintainers)

Most upvoted comments

I want to add that the nature of the files in .gitignore depends a lot on the nature of our projects. In my case for example most of the time the ignored files are files with sensitive information (not crap) that I want to be able to search with fd.

But I understand that for other people often the files in .gitignore have to be ignored by fd as well.

For this reason the desired default behavior is not the same for everyone.

In my opinion, the default behavior should be “search all”, because it’s easier to figure out why there are too many results than it is to figure out why there are missing results.

But such a change in behavior will not be backward compatible, which is never good. To overcome this, people must be allowed to easily return to the old default behavior. Hence the need to be able to configure the default behavior of fd (#362).

Just to throw my opinion into the ring. I’m in favor of changing the default.

When I use fd with no options/arguments other than a pattern, 99% of the time I’m just using it to quickly narrow down the list of files I need to look at to find what I want. In that case I’m okay if I get some things that I don’t care about in the search, but I’m much more annoyed if I don’t find something that’s actually there because I forgot to specify that I wanted to search .gitignored files as well. @sharkdp said that he adds -H or -u around 20% of the time, meaning those flags mattered 20% of the time. But I’m willing to bet that if those flags were enabled by default then they would have to be disabled much less than 20% of the time.

Also, from a scripting/reducing noise perspective, normally when when I’m doing something more precise than just quickly narrowing down a list of files, I’m more willing to add flags and check the documentation in order to narrow down the search results to be only what I care about.

And concerning adding an fdg binary (or symlink), I don’t see how that’s better than just adding an optional flag. It feels like it would complicate CI and packaging a lot for something that essentially just flips a flag on by default.

I think the point of having a tool like this is that it’s opinionated. If I have to start adding flags to reach the default behavior/length of find, I might as well use find

Like rg and the rest of the modern tools, what makes them great is their defaults. If the only advantage is a very minor speed bump, people would just use the preinstalled tools they already know.

The fact that it ignores hidden and gitignored files is in the main bullet point feature list. If one doesn’t bother to read that…

The dilemma

I think that for this issue and for #362 the question is:

Should fd behaves as “general” or “git-style” tool ?

What I mean by this is summarized in the following table

general tool git style tool
examples find,grep git,rg
ignore files no yes
configuration flags only flags, files, environment variables

Actually fd is in between the two worlds. And respecting the ignore files without the configuration feature is bad, IMO. So fd should choose the red pill or the blue pill 😉

My proposal

Or may be we don’t have to choose and we can have both.

I think that :

  • fd should be a general tool (by default)
  • there should be a flag --git that makes it act in a “git style”
  • fdg should be another compiled version from the same code but with different default behavior, that is equivalent to fd --git and heaving --no-git flag to switch it to “general tool”.

My arguments for creating the additional fdg are the following:

  • it should be easy to make the automation tool to build two versions, one with --no-git option as default, the other with --git as default.
  • the alias solution discussed already in #362 is very “shell specific” :
    • there is no general alias that can work in all shells (powershell and cmd included)
    • every time we download fd we should setup the aliases, so the advantage of “no install, just download and use it” is not valid any more
    • aliases are not working when you wan to make a general script (let’s say in python) using fd
  • this solve the backward compatibility with ignore files as any user can decide simply to switch to fdg or even to rename fdg fo fd and continue using it as before (respecting the ignore files by default)
  • this allows to solve not only this issue but also #362 and all other issues that ask to change the default behavior of fd (this would be done using the configuration file in fdg)
  • having both versions will make the supporters on both sides (“pro-general” and the “pro-git”) happy

I would add my vote to search all files except hidden files by default.

Just adding my experience here that this caught me off guard multiple times. Most users install fd as a replacement for find, so it can be surprising when it doesn’t show certain files by default.

In case it helps, I thought fd was broken while searching for something I knew was in my node_modules directory due to this.

Overall the question comes down to which is the worst case:

  1. A user can’t find a file they expected to because it is captured by .gitignore

  2. A user gets additional output

I think 1 is a obviously a significantly worse situation. There’s an adage in experimental sciences that applies: it is better to record too much information than too little. This is because too little has (generally) far higher costs than not enough (which requires doing everything over because you didn’t know everything you’d need beforehand).

I think this is a really good point and I am seriously considering a switch of the default behavior in fd version 9. This would be a major breaking change. I know for a fact that people are using fd in scripts and pipelines. They will have to adapt (check) their code when upgrading to fd 9.

One practical problem is that we have a set of (short) command-line options that are designed to work with the current default, like -I/--no-ignore. We have a (somewhat hidden) --ignore counterpart, but no suitable short option. We would also have to figure out what to do with --no-ignore-vcs, --no-ignore-parent, unrestricted, etc.

If this default changes, I would humbly request that an inverse CLI flag allows us to override previous CLI flags.

For example…

# Case-insensitive
fd -s -i

# Case-sensitive
fd -i -s

# Will ignore 
fd --no-ignore --ignore

# Will not ignore
fd --ignore --no-ignore

This way folks can easily choose their default via an interactive shell alias, but still have the option e.g.

alias fd="fd --ignore"

FWIW, ripgrep (which I imagine a lot of fd users also use) also respects gitignore by default.

Hello.

I’d like to make a point that fd is a general-purpose file-searching utility that is not git specific, so having it to take into account .gitignore files, laying around in the filesystem, does not feel right. In fact, I’ve stumbled upon this issue the very first time I tried fd: I’ve tried to find something, starting from a non-git directory in subdirectories which happened to be git repos and found nothing, although I knew it was there. After that, the very next thing I did is patched fd locally, so it wouldn’t read .gitignore files by default.

My suggestion would be to not change the default, so we don’t break anyone’s workflow. Instead, how about something like this:

$ fd -e o
(9 ignored files skipped, 2 hidden files skipped; see fd -h for details)

With #595 implemented, users could make fd an alias to fd --no-hidden --ignore to keep the current behaviour and suppress the warning, or to fd -HI to show everything by default.

I’d be okay with always printing that warning, even if there are matches. But especially if there are no matches it might be handy.

IMO, all of this discussion about what are the best defaults points to the following conclusions:

  • There are no perfect defaults.
  • There should be a way for everyone to set their own defaults, independent of the way they call fd (aliases, wrappers, environment variables are not, they are shell dependent), so having a config file is IMO mandatory to solve this issue (see #362).
  • To make the fd runs reproducible, a --ignore-config flag should be added if the config file is introduced (see #362).

My personal opinion on what are the “best” defaults should be discussed from a newbie’s point of view. More experienced users will know how to tweak the tool to their own needs.

I also like @tavianator proposal, possibly with the following caveats:

  • fd should print a warning only if outputting to stdout in an interactive shell
  • The warning should not include counts for performance reasons but simply state:
(Rerun with `-u` to also search ignored files, `-uu` to search all files)

Thought dump:

git is primarily concerned about the contents of files, their state, but not their presence. This means that .gitignore files are also about the state of the files, but not their presence.

ripgrep, just like git, also primarily concerned about the contents of files, and this shared concern makes its choice of consideration of .gitignore files understandable, although it could also be opt-in.

fd, on the other hand, is not concerned about the state of the files, but is concerned about their presence, what differs from concerns of git and ripgrep, what makes its consideration of .gitignore files slightly less fitting.

Well smile. I’m not so sure about this. We currently use the ignore crate for parallel directory traversal + gitignore handling. I don’t think it (currently) has a mode where it gives us all files, but marks that ones that would be ignored (or similar). But I might be wrong.

Yeah, might require a patch to ignore. We don’t need to know what paths they were, or even how many, just whether it ignored anything.

If we want to show this warning, it would mean that we will always have to parse gitignore files, even in the presence of -I/--no-ignore.

I don’t think we need to show the warning when -I is passed, at least about ignored files. We could still warn about hidden files unless -H is passed.

Adding this to the “fd 9” milestone because I would like to settle this discussion and introduce the (possible) breaking change in that version (see #613).

If you do make this change, please consider using separate flags for .gitignore, .fdignore, etc. I have run into valid use cases for (observe .fdignore, ignore .gitignore) and visa-versa.

Examples: -I/--ignore -- Ignore file patterns in .gitignore and .fdignore -Ig/--ignore-gitignore -- Ignore file patterns in .gitignore -If/--ignore-fdignore -- Ignore file patterns in .fdignore -N/--no-ignore -- do Not ignore file patterns in .gitignore and .fdignore -Ng/--no-ignore-gitignore -- do Not ignore file patterns in .gitignore -Nf/--no-ignore-fdignore -- do Not ignore file patterns in .fdignore

Unfortunately these all use double-negatives, and there is a potential confusion about the double-meaning of “ignore .gitignore” (ignoring the .gitignore file and ignoring the files within it have opposite meaning). Other terms that may be less confusing:

  • E/N - --Exclude-from (exclude files listed in .{}ignore), do Not exclude
  • U/N - Use, do Not use

There is precedent for fine-grained ignore params in ripgrep: (--no-ignore-dot, --no-ignore-global, --no-ignore-vcs, etc.)

[ If the above comment about supporting non-git repositories is implemented, then Ig might become Iv (for vcs) ]

I have an additive suggestion, which could leverage or make the suggestion obsolete: Add an according description to tldr. rg/ripgrep has a description rg -uu pattern, which is the second result and thus searchable in 1s.

20% typing the thing would then overall still mean less time. Bonus is that -uu could be established as use hidden github stuff or “do more work”.

One client for tldr is tealdeer.

We already have ~/.fdignore. Maybe this file could somehow ‘include’ gitignore (via something like @~/.gitignore or some other character/directive)? With this approach showing git-ignored files could be enabled via default, also allowing user to add his git-ignored entries that are already in ~/.gitignore (or ./.git/ignore when inside repository) in an easy way?

Personally, I wasn’t even aware that git-ignored files are omitted: https://imgur.com/a/UlLD8ED For now my .fdignore contains mostly 100% of .gitignore + other patterns. It would be great to be able to ‘include’ the file as a whole, not to copy it’s content manually.

I like that idea, with one caveat. Instead of having a separately compiled version of fd, fdg should just be a symlink to fd, and fd check the name that it is called with, and if it is “fd” use the general behavior, and if it is “fdg” use the “git” behavior. Or alternatively distribute OS-specific wrapper scripts for fdg (for example that does something like exec fd --ignore, or whatever the windows equivalent is).

I’m not entirely opposed to a change in the default, as long as there is an easy way for users to keep the current behavior if they want. Which could be as simple as being able to do alias fd="fd --ignore-vcs" (or --ignore-git), as long as I can still use --no-ignore-vcs, -u, -I, etc. to turn off the previous --ignore-vcs (which is how it currently works).

One practical problem is that we have a set of (short) command-line options that are designed to work with the current default, like -I/–no-ignore

That brings up the question, should the new default be the equivalent of current fd --no-ignore, or fd --no-ignore-vcs?

Personally, I think it would be a little surprising if fd doesn’t respect .fdignore files by default. .ignore is more questionable. OTOH, in the case that you don’t use any ignore files, bypassing the ignore machinery could improve performance.

If we changed the default to fd --no-ignore-vcs, then the -I, --no-ignore option would still be meaningful, since it excludes the .fdignore and .ignore files. Although perhaps not needed quite as often.

We have a (somewhat hidden) --ignore counterpart, but no suitable short option.

For the long option, I think we would probably reverse the importance in the documentation (although maybe make the --no-ignore more prominent than it currently is).

As for the short option, that depends on what direction we went with for --no-ignore vs --no-ignore-vcs as the default.

If we went with --no-ignore as the default, -i would be a good choice as an alias for --ignore, except that it is already taken for “case-insensitive”, but maybe we could change that, although that increases the potential breakage. Or we could invert the meaning of -I, which also would increase the scope of the breakage, and would be inconvenient for anyone who aliases fd to fd --ignore, since there isn’t a short option to re-disable it, but maybe we could add a new short option for that as well. Or we could do something like -I means --ignore and +I means --no-ignore, but I don’t think clap supports that convention, and it isn’t a terribly common convention for CLIs.

If we went with --no-ignore-vcs as the default, there isn’t currently a short option for --no-ignore-vcs, but it might be worth adding a short --ignore-vcs, perhaps -G for git? Although if we ever added support for additional VCSs that would make less sense. -v is currently available, but I worry about that being confused for “version” or “verbose” (and possibly we would want to use that for an alias to --verbose at some point?).

We would also have to figure out what to do with --no-ignore-vcs, --no-ignore-parent, unrestricted, etc.

I think those could probably stay the same as they are. Although make --ignore-vcs the main option documented instead of --no-ignore-vcs.

Maybe a minor update can be pushed to give a deprecation warning before any changes are made?

Where/when would we show this deprecation warning? Every time fd ran without a --no-ignore(-vcs) flag? That would be incredibly annoying IMO.

For the --no-ignore* flags, I think they should stay for a bit but just be non-operations (later you can leave but remove from man because it is just legacy support and add a deprecation flag here saying no longer needed)

No, we should keep them. Because I think we should support the use case of using an alias (or wrapper script) that passes --ignore-*, but allow negating it by --no-ignore-* later in the arg list. Just as we currently allow passing --ignore to undo a previous --no-ignore.

I don’t think short flags are always necessary (I actually encourage people to use the log flags for aliases or scripts because they are better documentation), but I definitely get the desire

For scripts or aliases, I absolutely agree. However, for interactive use, I think that having short names for commonly used options is very valuable. And I think that turning the ignore functionality back on would be a pretty common usage, at least for me.

rg is not part of the standard command set and isn’t really relevant to this conversation.

fd uses the same code for determining which files to ignore as rg. Some of fd’s options were designed specifically to match options in rg. I generally view fd being to find what rg is to grep. And I strongly suspect that there is a large overlap between users of rg and users of fd. I do think it is relevant to the conversation. Maybe for searching for files based on their names, respecting .gitignore is less important than it is for ripgrep. But if so, I think it is worth asking why that is.

rg is not part of the standard command set and isn’t really relevant to this conversation.

My argument isn’t I (Steven) have this particular use case vs you (tmccombs or matu3ba) have a particular use case. I just gave those as an example.

My argument is “which default yields the lowest entropy”

The reasoning to my argument is “follow same set of defaults as the standard system.”

Personally, I’m just throwing alias fd=fd --no-ignore into my rc and calling it a day. From a design perspective, I strongly believe more confusion is created by a default that excludes files that one would produce via standard commands such find, ls, grep, locate, and so on. We’re talking about default options. If you’re using rg I assume you can be like me and throw alias fd=fd --git into your rc and call it a day too. The question is not “what do you find useful” but “what behavior is most expected from a new user”. Let’s just make sure we get the framing right and let’s also not forget that alias exists. I mean we all have dotfiles, right?

The frequent issues are explicit evidence that such default behavior does create huge surprisal to users. So is the fact that it’s in the first line of the documentation and in the feature list. When it doesn’t, those users probably read the documentation closely. If a user reads the documentation closely they can easily throw in an alias into their rc file (because that’s what those files are for, personalization) and go on about their day and the github issues will go away. We can even think about this from another perspective if the several I have given aren’t enough. Which would be a larger breaking change: if the default is to ignore the *foo glob and you remove that default filter or the default is to have no filter and you introduce a *foo glob. Obviously the latter results in a higher surprisal to the user.

The arguments for the default filter are arguments of personal use case, which is why I said the desired behavior depends on what type of developer you are. Providing customization options are fantastic and I’m super happy fd has these. That still doesn’t change the issue that the current default creates higher surprisal. If you want to convince the --no-ignore crowd that we’re wrong you have to convince us that this default creates less surprisal.

You’re not going to go over to exa or lsd and find tons of issues “command outputs files that pattern match gitignore, this is unexpected behavior.” It would be silly to think so and that’s why it feels weird to even be having this discussion. I am surprised that you are surprised that people are surprised that the “better find” tool filters out files that aren’t hidden.

Just let the user know if and how many ignored or hidden files have been skipped.

I would also be in favour of this. Also inform the user how they can make sure not to skip these files.

Windows 7 user here. I have a dedicated folder with CLI tools added to PATH variable. For example, there is ripgrep with .ripgreprc next to it, which contains settings I need with every launch like --smart-case or coloring preferences. I like the portability of this approach instead of polluting %UserProfile%, registry or creating more env variables. So it would be nice to have similar configuration here. I would use it to make -H permanent, because I always forget to add it (find shows all).

Some thoughts on a couple of points brought up in this thread, though nothing terribly new.

1. Conflating _git_ ignore with general ignore

I’m also in favour of changing the default, because IMO paths being present .gitignore mean literally what it says on the tin - “not interesting for the purposes of version control” and I wish that tools like fd as well as ripgrep did not overload this definition to mean roughly “not interesting to search/scan in by default”. I, like many others here, have had false negatives due to this. It’s (subjectively!) a bit sad that in order to be confident in a negative search result one has to either provide extra flags, or rerun the search using a “legacy” tool.

1. Special treatment of `.gitignore` with regards to other similar files

Did you ever run grep on repos with huge binary files (>5 GB) or big amount of files ignored by .gitignore ? Especially binary data (without newline) use linear time and that is why ripgrep has another default than grep. For usage for fd of many, many files inside .gitignore ie compiling Linux Kernel the same argument can be made.

fd, or to be more precise the ignore crate that it uses, appears to only support git’s ignore files, which means fd’s behaviour for say Mercurial users will be different. Firefox, arguably the poster child for Rust, uses Mercurial for example.

Argument of authority is no technical argument on usage. And you cant make everyone happy for using the tool. Here a short catalogue for decision making:

  1. Usage consistency What type of consistency to other tools (arguments + effects) should be used? (for me that is Rust and ripgrep, if possible)?
  2. Usage purpose Ignoring build files has the purpose of supporting dev environments, where you frequently want to search for (relative) filepaths in a complex tree. (effect of clear speed win as less paths and files need to be traversed)
  3. Oriented user base If the author chooses to support such thing, when should a version control tool be supported? (market share, user base?)
  4. Clarification How should it be documented? (manual, cheat sheet, tealdeer,tldr )
  5. Technical feasability How many build files can be “hosted” (as result of codegen) on Mercurial to justify ignoring them?
  6. Usage feasability How many build files are “hosted” (as result of codegen) on Mercurial to justify ignoring them?
1. Prior art re: aliases/separate binaries

This applies to grep, and is likely Linux (or perhaps even Debian) specific:

root@9fb4e89aea1b:/# man grep | head -4
GREP(1)                                                                User Commands                                                               GREP(1)

NAME
       grep, egrep, fgrep, rgrep - print lines that match patterns

So, using aliases for commonly-used flags is at the very least nothing new.

How should this be maintained and name-clashes prevented ? cfdisk, df, efi,rfkill are already used. Do you have specific names in mind?