regex: Does not support (?!...) negative lookahead assertion?

Python re module supports (?!...) syntax, see https://docs.python.org/2/library/re.html#regular-expression-syntax

The code below compiles but paniced at runtime:

extern crate regex;

fn main() {
    let re = regex::Regex::new(r"Isaac (?!Asimov)").unwrap();
    println!("{}", re.is_match("Isaac "));
    println!("{}", re.is_match("Isaac Asimov"));
}

thread '<main>' panicked at 'called `Result::unwrap()` on an `Err` value:
Syntax(Error { pos: 8, surround: "ac (?!Asim", kind: UnrecognizedFlag('!') })', 
../src/libcore/result.rs:738

The pattern works fine in Python:

import re

pt = re.compile(r"Isaac (?!Asimov)")
pt.match("Isaac ")
pt.match("Isaac Asimov")

About this issue

Original URL
State: closed
Created 9 years ago
Reactions: 15
Comments: 21 (10 by maintainers)

Links to this issue

[Question] Full Regex support or workarounds?

Most upvoted comments

Hmm… Maybe it’d make sense to implement them, then? To achieve compatibility with other regex engines and with people’s expectations.

+74

obskyr on Jun 21, 2017

there’s already someone who maintains the regex package

Yes. That’s me. 😃

And as for API design work, that’s not really significant at all, is it?

I included a lot more than API design work in my previous comment.

(I’d even dare to say they’re expected) for performance is the right choice.

I disagree. So do a lot of people. Predictable performance is important.

I don’t think we’re going to get very far with this. Here are the facts on the ground:

There are no plans to add lookaround or any other features that lead to hard-to-predict performance. Even if there were, the time scale for that happening would be “years.”
If you want fancier features, you might elect to just use PCRE2, or, if you want a pure Rust solution, you could use the fancy-regex crate (which is built on this one).

+34

BurntSushi on Jun 21, 2017

Nope. Just use fancy_regex.

+23

BurntSushi on Jul 14, 2022

You might have more luck at a general help forum. I’m generally the only one answering questions here, and I don’t really have time to convert look-arounds to non-look-arounds. Maybe reddit.com/r/regex?

But if you’re right, then Discord’s regex filtering sounds pretty limited. You might not be able to make it far if you require more sophisticated filtering.

On top of that, Discord likely has limits on the size of regexes they allow, so you can’t just go as big as you want.

BurntSushi on Nov 12, 2022

And in the future, it would be helpful if you post a Discussion question instead of bumping an old issue that is only tangentially related to the problem you’re trying to solve.

BurntSushi on Nov 12, 2022

Thanks for the tip. Usually you can refactor to kind of fake the positive lookarounds just using capture groups, but the negative ones are more difficult. For example:

# Test strings
foobar foobaz fuubar not foo but foo

# PCRE expression ->   workaround
foo(?=bar)        ->   (foo)bar
foo(?!bar)        ->   (foo)(?:[b][a][^z]|[^b][^a][^z]|$). # Not a "true" workaround
(?<=foo)bar       ->   foo(bar)
(?<!not )foo      ->   no "true" workaround, could do similar to ?!

There just really isn’t a solid concept of “not” outside of full PCRE, so these negatives are limitations without good refactoring, where you kind of have to know all possibilities you might get. Consumption is the other issue - i.e., the first (foo)bar will consume the entire foobar, making bar unavailable for further matching (bad example here, this matters more when your start and end delimiters are the same).

tgross35 on Aug 16, 2022

In some cases using this could work:

[x[^xyz]]     Nested/grouping character class (matching any character except y and z)

See: https://docs.rs/regex/latest/regex/#character-classes

finalclass on Aug 15, 2022

Ah, right, regexes supplied by users. You’re right, hat’s a consideration. One possibility is to provide an option for turning off features that may increase run time significantly, or a feature to detect whether they’re present beforehand.

As for “maintain it”… there’s already someone who maintains the regex package, I hope. And as for API design work, that’s not really significant at all, is it? Sure, deciding whether and if so how to handle shutting off the “slow” features might take a bit of consideration, but beyond that there’s no extra API design to do at all. The main (only?) real point is that you have to find someone to write the code. Which of course isn’t trivial at all, but I don’t think entirely foregoing very common features (I’d even dare to say they’re expected) for performance is the right choice.

obskyr on Jun 21, 2017

I don’t know because I don’t know what problem you’re trying to solve. Any of the following things might work depending on your situation:

Reformulate your regex into one that does not require arbitrary look-ahead in the input.
Post process your matches to exclude matches that would have been excluded by the negative assertion.
Use rust-pcre (I’m not sure if that’s being maintained or if it works.)

BurntSushi on Nov 7, 2015