puppet-lint: puppet-lint is too slow when parsing manifests with lots of entries
I’m running the most recent version of puppet-lint (2.2.1). I recently had to re-factor a (fairly large) manifest that declared many (~5k) virtual resources because puppet-lint was simply taking an absurd amount of time to run - many minutes, in some cases many hours.
For example:
0 % for i in `ls *.pp | sort -n`; echo -n "$i - " && time puppet-lint $i
1.pp - puppet-lint $i 0.07s user 0.02s system 98% cpu 0.094 total
2.pp - puppet-lint $i 0.08s user 0.02s system 98% cpu 0.095 total
4.pp - puppet-lint $i 0.09s user 0.01s system 98% cpu 0.101 total
8.pp - puppet-lint $i 0.09s user 0.02s system 98% cpu 0.114 total
16.pp - puppet-lint $i 0.12s user 0.02s system 99% cpu 0.143 total
32.pp - puppet-lint $i 0.18s user 0.04s system 99% cpu 0.218 total
64.pp - puppet-lint $i 0.35s user 0.04s system 99% cpu 0.391 total
128.pp - puppet-lint $i 0.80s user 0.08s system 99% cpu 0.876 total
256.pp - puppet-lint $i 2.34s user 0.13s system 99% cpu 2.482 total
512.pp - puppet-lint $i 8.00s user 0.30s system 99% cpu 8.300 total
1024.pp - puppet-lint $i 29.66s user 0.48s system 99% cpu 30.151 total
2048.pp - puppet-lint $i 116.53s user 0.98s system 99% cpu 1:57.53 total
4096.pp - puppet-lint $i 437.12s user 1.93s system 99% cpu 7:19.14 total
8192.pp - (still going as I write this)
It sure looks like exponential growth here. I haven’t looked at the code to figure out where this is happening.
I generated these manifests with a simple script:
#!/bin/bash
COUNT=13
for i in `seq 0 $COUNT`; do
NUMBER=$((2**$i))
FILE=$NUMBER.pp
echo "#" > $FILE
echo "class test {" >> $FILE
for j in `seq 1 $NUMBER`; do
echo " file { 'foo${j}': ensure => absent }" >> $FILE
done
echo "}" >> $FILE
done
About this issue
- Original URL
- State: open
- Created 7 years ago
- Reactions: 1
- Comments: 17 (14 by maintainers)
Commits related to this issue
- Use lookahead assertion for matching function name For some reason using a lookahead assertion for the opening parenthesis after a function name is much faster for large manifests. The old version sh... — committed to usev6/puppet-lint by usev6 5 years ago
- Speed up calculation of resource indexes By ignoring the already analysed tokens the index operations on the tokens array become substantially faster for large manifests. This should solve issue #717... — committed to usev6/puppet-lint by usev6 5 years ago
- Merge pull request #893 from usev6/gh-717 Speed up calculation of resource indexes — committed to rodjek/puppet-lint by rodjek 5 years ago
It’s out! So, new numbers:
1.pp - puppet-lint $i 0.15s user 0.03s system 88% cpu 0.207 total 2.pp - puppet-lint $i 0.16s user 0.03s system 99% cpu 0.195 total 4.pp - puppet-lint $i 0.10s user 0.02s system 99% cpu 0.128 total 8.pp - puppet-lint $i 0.14s user 0.03s system 99% cpu 0.173 total 16.pp - puppet-lint $i 0.13s user 0.02s system 99% cpu 0.149 total 32.pp - puppet-lint $i 0.19s user 0.03s system 99% cpu 0.223 total 64.pp - puppet-lint $i 0.18s user 0.02s system 99% cpu 0.205 total 128.pp - puppet-lint $i 0.35s user 0.04s system 99% cpu 0.386 total 256.pp - puppet-lint $i 0.41s user 0.02s system 99% cpu 0.426 total 512.pp - puppet-lint $i 0.84s user 0.02s system 99% cpu 0.867 total 1024.pp - puppet-lint $i 1.97s user 0.04s system 99% cpu 2.012 total 2048.pp - puppet-lint $i 5.78s user 0.06s system 99% cpu 5.842 total 4096.pp - puppet-lint $i 21.43s user 0.14s system 99% cpu 21.574 total 8192.pp - puppet-lint $i 88.84s user 0.27s system 99% cpu 1:29.15 total
I think that some of the core issue is still there, but this is certainly a huge improvement.
Parsing large manifests should be much faster on HEAD, since https://github.com/rodjek/puppet-lint/pull/854 has been merged. This issue could probably be closed after the next release.
I think I’ve discovered what caused the slowdown for large manifests: The regex that is used to match a function name (cmp. PR #854) becomes very slow for large inputs. If I’m not mistaken the pathologic behaviour can be golfed down to something like this:
If I run this I get:
The faster match for string2 with re1 is probably related to some warmup/caching, but it’s obvious that the match with re2 becomes much slower for the longer string.
OOC I tried the same code with perl
… and got a nearly identical result: