godot: Regex capture stops at first match- doesn't work as expected in godot (extracting data from <> tags)
Hi, I am trying to sepparate text from xml tags. The first step is to extract the tags
Works everywhere else but in godot. I get only the first match 😦
Here is example code:
func _ready():
print("TAGS:",extractXmlTags("Hello, <silence 1.0>my name is Jonn, I am a <speed 0.2> blah blah blah blah blah"))
## should return [silence 1.0,speed 0.2], but returns [silence 1.0]
func extractXmlTags(text):
var NameRegEx = RegEx.new()
NameRegEx.compile('<(.*?)>') ## also <(.*?)> ## <([^<]+)>
NameRegEx.find(text)
var result = NameRegEx.get_captures()
return result
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 25 (11 by maintainers)
Ah, right. The problem with
([^<]*)(<[^<]+>)?is that it’s valid with zero length strings, in any implementation, so once it gets to the end of your text it just loops infinitely (because a zero length match is a valid match). And then from there, the array just grows until it runs out of memory and crash.And the problem with crashing isn’t regex specific, because it essentially boils down to:
Anyways, I’ve written RegEx.search_all() in #12915 that prevents infinite loops such as this by detecting when a result doesn’t move.
Ah, wait, the example I gave is with the 3.0 branch. Here’s the solution for the 2.1 branch:
EDIT: Fixed the typo
>=0It feels like you’re piling a few problems together, so let me tackle each problem one-by-one.
You mean like
RegExMatch.get_strings()?I’m not sure what you mean by that.
get_string(0)(or alternativelyget_string()with no parameters) is the naive I-dont-care-about-the-structure-of-the-regex match result.While there is some extra boiler-plate lines necessary, I fail to understand how it’s more complicated. Here’s the first example re-written in gdscript:
The only difference is that it’s two lines extra. And those two extra lines are because:
a) Regex is an optional module. Not everyone uses it. Having
String.match()creates a hard dependency in the core type.b) In native modules,
Object.new()cannot accept any parameters. It’s a limitation of the engine.And here’s the second example re-written in gdscript:
Eh, to be honest, it wasn’t that bad. Just that I had a stressful day that day and it just added to it. No worries.
Anyways, now that it’s been merged, does that solve your regex issues?
Yeah, I could do something like that. Perhaps something like:
Should be easy enough. I’ll get that done when I’m free.
Ah, sorry, I was just following your pythex link as reference. The following function:
Should give you the output:
Hopefully that’s more useful for you. Just replace the
printwith the actual functions you want.Pythex of the RegEx code used
EDIT: Changing
([^<]+)into([^<]*)should deal with the case of text starting with a tag.The design behind
RegEx.find()was kinda inspired by the C++ string find. You do subsequent searches by specifying the start point, which you can do viaRegExMatch.get_end(0)I really need to get a more intuitive API for this, but I’ve been pretty busy lately.