puppet-lint: puppet-lint is too slow when parsing manifests with lots of entries

I’m running the most recent version of puppet-lint (2.2.1). I recently had to re-factor a (fairly large) manifest that declared many (~5k) virtual resources because puppet-lint was simply taking an absurd amount of time to run - many minutes, in some cases many hours.

For example:

0 % for i in `ls *.pp | sort -n`; echo -n "$i - " && time puppet-lint $i
1.pp - puppet-lint $i  0.07s user 0.02s system 98% cpu 0.094 total
2.pp - puppet-lint $i  0.08s user 0.02s system 98% cpu 0.095 total
4.pp - puppet-lint $i  0.09s user 0.01s system 98% cpu 0.101 total
8.pp - puppet-lint $i  0.09s user 0.02s system 98% cpu 0.114 total
16.pp - puppet-lint $i  0.12s user 0.02s system 99% cpu 0.143 total
32.pp - puppet-lint $i  0.18s user 0.04s system 99% cpu 0.218 total
64.pp - puppet-lint $i  0.35s user 0.04s system 99% cpu 0.391 total
128.pp - puppet-lint $i  0.80s user 0.08s system 99% cpu 0.876 total
256.pp - puppet-lint $i  2.34s user 0.13s system 99% cpu 2.482 total
512.pp - puppet-lint $i  8.00s user 0.30s system 99% cpu 8.300 total
1024.pp - puppet-lint $i  29.66s user 0.48s system 99% cpu 30.151 total
2048.pp - puppet-lint $i  116.53s user 0.98s system 99% cpu 1:57.53 total
4096.pp - puppet-lint $i  437.12s user 1.93s system 99% cpu 7:19.14 total
8192.pp - (still going as I write this)

It sure looks like exponential growth here. I haven’t looked at the code to figure out where this is happening.

I generated these manifests with a simple script:

#!/bin/bash

COUNT=13

for i in `seq 0 $COUNT`; do
    NUMBER=$((2**$i))
    FILE=$NUMBER.pp
    echo "#" > $FILE
    echo "class test {" >> $FILE
    for j in `seq 1 $NUMBER`; do
        echo "  file { 'foo${j}': ensure => absent }" >> $FILE
    done
    echo "}" >> $FILE
done

About this issue

Original URL
State: open
Created 7 years ago
Reactions: 1
Comments: 17 (14 by maintainers)

Commits related to this issue

Use lookahead assertion for matching function name For some reason using a lookahead assertion for the opening parenthesis after a function name is much faster for large manifests. The old version sh... — committed to usev6/puppet-lint by usev6 5 years ago
Speed up calculation of resource indexes By ignoring the already analysed tokens the index operations on the tokens array become substantially faster for large manifests. This should solve issue #717... — committed to usev6/puppet-lint by usev6 5 years ago
Merge pull request #893 from usev6/gh-717 Speed up calculation of resource indexes — committed to rodjek/puppet-lint by rodjek 5 years ago

Most upvoted comments

It’s out! So, new numbers:

1.pp - puppet-lint $i 0.15s user 0.03s system 88% cpu 0.207 total 2.pp - puppet-lint $i 0.16s user 0.03s system 99% cpu 0.195 total 4.pp - puppet-lint $i 0.10s user 0.02s system 99% cpu 0.128 total 8.pp - puppet-lint $i 0.14s user 0.03s system 99% cpu 0.173 total 16.pp - puppet-lint $i 0.13s user 0.02s system 99% cpu 0.149 total 32.pp - puppet-lint $i 0.19s user 0.03s system 99% cpu 0.223 total 64.pp - puppet-lint $i 0.18s user 0.02s system 99% cpu 0.205 total 128.pp - puppet-lint $i 0.35s user 0.04s system 99% cpu 0.386 total 256.pp - puppet-lint $i 0.41s user 0.02s system 99% cpu 0.426 total 512.pp - puppet-lint $i 0.84s user 0.02s system 99% cpu 0.867 total 1024.pp - puppet-lint $i 1.97s user 0.04s system 99% cpu 2.012 total 2048.pp - puppet-lint $i 5.78s user 0.06s system 99% cpu 5.842 total 4096.pp - puppet-lint $i 21.43s user 0.14s system 99% cpu 21.574 total 8192.pp - puppet-lint $i 88.84s user 0.27s system 99% cpu 1:29.15 total

I think that some of the core issue is still there, but this is certainly a huge improvement.

tskirvin on Oct 8, 2019

Parsing large manifests should be much faster on HEAD, since https://github.com/rodjek/puppet-lint/pull/854 has been merged. This issue could probably be closed after the next release.

usev6 on Sep 23, 2019

I think I’ve discovered what caused the slowdown for large manifests: The regex that is used to match a function name (cmp. PR #854) becomes very slow for large inputs. If I’m not mistaken the pathologic behaviour can be golfed down to something like this:

#!/usr/bin/env ruby

string1 = "some line that doesn't match\n" * 1024 * 16
string2 = "some line that doesn't match\n" * 1024 * 1024

## needs some non-trivial character class with a + quantifier to show the difference
re1 = %r{\A([a-zA-Z]+)(?=\()}
re2 = %r{\A([a-zA-Z]+)\(}

print 'Matching string1 with re1 takes: '
start = Time.now
chunk = string1[re1, 1]
puts 10000 * (Time.now - start)

print 'Matching string1 with re2 takes: '
start = Time.now
chunk = string1[re2, 1]
puts 10000 * (Time.now - start)

print 'Matching string2 with re1 takes: '
start = Time.now
chunk = string2[re1, 1]
puts 10000 * (Time.now - start)

print 'Matching string2 with re2 takes: '
start = Time.now
chunk = string2[re2, 1]
puts 10000 * (Time.now - start)

If I run this I get:

Matching string1 with re1 takes: 0.17579
Matching string1 with re2 takes: 3.08462
Matching string2 with re1 takes: 0.046959999999999995
Matching string2 with re2 takes: 194.09687

The faster match for string2 with re1 is probably related to some warmup/caching, but it’s obvious that the match with re2 becomes much slower for the longer string.

OOC I tried the same code with perl

#!/usr/bin/env perl

use strict;
use warnings;
use Time::HiRes;

my $string1 = "some line that doesn't match\n" x (1024 * 16);
my $string2 = "some line that doesn't match\n" x (1024 * 1024);

my $re1 = qr/\A([a-zA-Z]+)(?=\()/;
my $re2 = qr/\A([a-zA-Z]+)\(/;

print 'Matching string1 with re1 takes: ';
my $start = Time::HiRes::gettimeofday();
$string1 =~ $re1;
my $end = Time::HiRes::gettimeofday();
print 10000 * ($end - $start), "\n";

print 'Matching string1 with re2 takes: ';
$start = Time::HiRes::gettimeofday();
$string1 =~ $re2;
$end = Time::HiRes::gettimeofday();
print 10000 * ($end - $start), "\n";

print 'Matching string2 with re1 takes: ';
$start = Time::HiRes::gettimeofday();
$string2 =~ $re1;
$end = Time::HiRes::gettimeofday();
print 10000 * ($end - $start), "\n";

print 'Matching string2 with re2 takes: ';
$start = Time::HiRes::gettimeofday();
$string2 =~ $re2;
$end = Time::HiRes::gettimeofday();
print 10000 * ($end - $start), "\n";

… and got a nearly identical result:

Matching string1 with re1 takes: 0.140666961669922
Matching string1 with re2 takes: 3.06844711303711
Matching string2 with re1 takes: 0.03814697265625
Matching string2 with re2 takes: 193.150043487549

usev6 on Feb 15, 2019