ripgrep: rg 0.8.1: regression in glob pattern matching

What version of ripgrep are you using?

$ rg --version
ripgrep 0.8.1 (rev c8e9f25b85)
+SIMD -AVX

What operating system are you using ripgrep on?

$ uname -a
Darwin Jupiter.local 17.4.0 Darwin Kernel Version 17.4.0: Sun Dec 17 09:19:54 PST 2017; root:xnu-4570.41.2~1/RELEASE_X86_64 x86_64

Describe your question, feature request, or bug.

I’m using ripgrep to search for files excluding certain directories. For example:

rg --files --no-ignore --hidden --follow -g '!{.git,node_modules}/*'

This used to exclude .git, and node_modules directories in current directory and all subdirectories in ripgrep 0.7.1. But ripgrep 0.8.1 lists the contents of .git and node_modules directories in subdirectories of current directory.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 16 (7 by maintainers)

Commits related to this issue

Most upvoted comments

Well . . . fine, but: a) even with that in the man, I’m not going to think to check some other tool’s man page to understand how something works, b) it’s an extra thing to run if my first hunch is “this is a problem with ripgrep”, c) I don’t know that it’s common knowledge that gitignore HAS a man page . . . I didn’t know that, d) the man for gitignore is dense enough that a tl;dr would be nice, e) most people assume they understand gitignore syntax already, so even with that note they won’t think to check man gitignore, f) even if, barring all this, you actually look up the man for gitignore, you still have to understand what it means, and phrases like Otherwise, Git treats the pattern as a shell glob suitable for consumption by fnmatch(3) with the FNM_PATHNAME flag make that virtually untenable. I guess what I’m asking is, like, maybe one example of negating a file path would be nice. It could be as simple as:

# Exclude node_modules from a search
rg -g '!node_modules` banana

Per your last, I don’t think there’s anything specifically about fzf here. I mean, you’re correct that the things I was reading online were about integrating ripgrep with fzf, and that definitely muddied the situation for me in terms of debugging, but I couldn’t get glob exclusion working even with a .ripgreprc just in my shell.

Is this documented anywhere?

Yes, in man gitignore. The man page for ripgrep says, “Globbing rules match .gitignore globs.” in the documentation for the -g/--glob flag.

TL;DR - Use -g '!exclude' instead of -g '!exclude/*'.

OK, so I fear this change may have actually been intended. Specifically, this is a result of fixing #761, which was done in #762. Basically, the idea here is that if a glob contains a /, then a * in the glob must not match a /. In your case, you have a glob exclude/* (which contains a slash), which means it is specifically limited to matching things like exclude/foo, but specifically not a/exclude/foo or a/b/exclude/foo because that would imply an implicit * prefix that matches through a / character. As #762 notes, the man gitignore documentation specifically calls out exactly this example.

Given that the -g/--glob flag is documented to follow gitignore semantics and given that I believe this behavior is consistent with gitignore behavior, I think that this actually isn’t a bug, and you were instead previously relying on the behavior of a bug. You can trivially work around this by using -g '!exclude' instead of -g '!exclude/*', which I believe accomplishes the same thing.

It is plausible that we might want to use semantics other than gitignore for the -g/--glob flags, but that is itself a much larger change.

The following script will create a test directory structure:

#!/bin/sh

mkdir -p test/a/b/c/
mkdir -p test/a/exclude
mkdir -p test/a/b/exclude
touch test/a/exclude/file1
touch test/a/b/exclude/file2
cd test
$ rg --debug --files --no-ignore --hidden --follow -g '!exclude/*'
DEBUG/globset/globset/src/lib.rs:396: glob converted to regex: Glob { glob: "exclude/*", re: "(?-u)^exclude/[^/]*$", opts: GlobOptions { case_insensitive: false, literal_separator: true }, tokens: Tokens([Literal('e'), Literal('x'), Literal('c'), Literal('l'), Literal('u'), Literal('d'), Literal('e'), Literal('/'), ZeroOrMore]) }
DEBUG/globset/globset/src/lib.rs:401: built glob set; 0 literals, 0 basenames, 0 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 1 regexes
DEBUG/grep::search/grep/src/search.rs:195: regex ast:
Repeat {
    e: Literal {
        chars: [
            'z'
        ],
        casei: false
    },
    r: Range {
        min: 0,
        max: Some(
            0
        )
    },
    greedy: true
}
a/exclude/file1
a/b/exclude/file2

And with ripgrep version 0.7.1:

$ rg --debug --files --no-ignore --hidden --follow -g '!exclude/*'
DEBUG:globset: glob converted to regex: Glob { glob: "**/exclude/*", re: "(?-u)^(?:/?|.*/)exclude/[^/]*$", opts: GlobOptions { case_insensitive: false, literal_separator: true }, tokens: Tokens([RecursivePrefix, Literal('e'), Literal('x'), Literal('c'), Literal('l'), Literal('u'), Literal('d'), Literal('e'), Literal('/'), ZeroOrMore]) }
DEBUG:globset: built glob set; 0 literals, 0 basenames, 0 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 1 regexes
DEBUG:grep::search: regex ast:
Repeat {
    e: Literal {
        chars: [
            'z'
        ],
        casei: false
    },
    r: Range {
        min: 0,
        max: Some(
            0
        )
    },
    greedy: true
}
DEBUG:ignore::walk: ignoring ./a/exclude/file1: Ignore(IgnoreMatch(Override(Glob(Matched(Glob { from: None, original: "!exclude/*", actual: "**/exclude/*", is_whitelist: true, is_only_dir: false })))))
DEBUG:ignore::walk: ignoring ./a/b/exclude/file2: Ignore(IgnoreMatch(Override(Glob(Matched(Glob { from: None, original: "!exclude/*", actual: "**/exclude/*", is_whitelist: true, is_only_dir: false })))))