puppet-lint: puppet-lint is too slow when parsing manifests with lots of entries

I’m running the most recent version of puppet-lint (2.2.1). I recently had to re-factor a (fairly large) manifest that declared many (~5k) virtual resources because puppet-lint was simply taking an absurd amount of time to run - many minutes, in some cases many hours.

For example:

0 % for i in `ls *.pp | sort -n`; echo -n "$i - " && time puppet-lint $i
1.pp - puppet-lint $i  0.07s user 0.02s system 98% cpu 0.094 total
2.pp - puppet-lint $i  0.08s user 0.02s system 98% cpu 0.095 total
4.pp - puppet-lint $i  0.09s user 0.01s system 98% cpu 0.101 total
8.pp - puppet-lint $i  0.09s user 0.02s system 98% cpu 0.114 total
16.pp - puppet-lint $i  0.12s user 0.02s system 99% cpu 0.143 total
32.pp - puppet-lint $i  0.18s user 0.04s system 99% cpu 0.218 total
64.pp - puppet-lint $i  0.35s user 0.04s system 99% cpu 0.391 total
128.pp - puppet-lint $i  0.80s user 0.08s system 99% cpu 0.876 total
256.pp - puppet-lint $i  2.34s user 0.13s system 99% cpu 2.482 total
512.pp - puppet-lint $i  8.00s user 0.30s system 99% cpu 8.300 total
1024.pp - puppet-lint $i  29.66s user 0.48s system 99% cpu 30.151 total
2048.pp - puppet-lint $i  116.53s user 0.98s system 99% cpu 1:57.53 total
4096.pp - puppet-lint $i  437.12s user 1.93s system 99% cpu 7:19.14 total
8192.pp - (still going as I write this)

It sure looks like exponential growth here. I haven’t looked at the code to figure out where this is happening.

I generated these manifests with a simple script:

#!/bin/bash

COUNT=13

for i in `seq 0 $COUNT`; do
    NUMBER=$((2**$i))
    FILE=$NUMBER.pp
    echo "#" > $FILE
    echo "class test {" >> $FILE
    for j in `seq 1 $NUMBER`; do
        echo "  file { 'foo${j}': ensure => absent }" >> $FILE
    done
    echo "}" >> $FILE
done

About this issue

  • Original URL
  • State: open
  • Created 7 years ago
  • Reactions: 1
  • Comments: 17 (14 by maintainers)

Commits related to this issue

Most upvoted comments

It’s out! So, new numbers:

1.pp - puppet-lint $i 0.15s user 0.03s system 88% cpu 0.207 total 2.pp - puppet-lint $i 0.16s user 0.03s system 99% cpu 0.195 total 4.pp - puppet-lint $i 0.10s user 0.02s system 99% cpu 0.128 total 8.pp - puppet-lint $i 0.14s user 0.03s system 99% cpu 0.173 total 16.pp - puppet-lint $i 0.13s user 0.02s system 99% cpu 0.149 total 32.pp - puppet-lint $i 0.19s user 0.03s system 99% cpu 0.223 total 64.pp - puppet-lint $i 0.18s user 0.02s system 99% cpu 0.205 total 128.pp - puppet-lint $i 0.35s user 0.04s system 99% cpu 0.386 total 256.pp - puppet-lint $i 0.41s user 0.02s system 99% cpu 0.426 total 512.pp - puppet-lint $i 0.84s user 0.02s system 99% cpu 0.867 total 1024.pp - puppet-lint $i 1.97s user 0.04s system 99% cpu 2.012 total 2048.pp - puppet-lint $i 5.78s user 0.06s system 99% cpu 5.842 total 4096.pp - puppet-lint $i 21.43s user 0.14s system 99% cpu 21.574 total 8192.pp - puppet-lint $i 88.84s user 0.27s system 99% cpu 1:29.15 total

I think that some of the core issue is still there, but this is certainly a huge improvement.

Parsing large manifests should be much faster on HEAD, since https://github.com/rodjek/puppet-lint/pull/854 has been merged. This issue could probably be closed after the next release.

I think I’ve discovered what caused the slowdown for large manifests: The regex that is used to match a function name (cmp. PR #854) becomes very slow for large inputs. If I’m not mistaken the pathologic behaviour can be golfed down to something like this:

#!/usr/bin/env ruby

string1 = "some line that doesn't match\n" * 1024 * 16
string2 = "some line that doesn't match\n" * 1024 * 1024

## needs some non-trivial character class with a + quantifier to show the difference
re1 = %r{\A([a-zA-Z]+)(?=\()}
re2 = %r{\A([a-zA-Z]+)\(}

print 'Matching string1 with re1 takes: '
start = Time.now
chunk = string1[re1, 1]
puts 10000 * (Time.now - start)

print 'Matching string1 with re2 takes: '
start = Time.now
chunk = string1[re2, 1]
puts 10000 * (Time.now - start)

print 'Matching string2 with re1 takes: '
start = Time.now
chunk = string2[re1, 1]
puts 10000 * (Time.now - start)

print 'Matching string2 with re2 takes: '
start = Time.now
chunk = string2[re2, 1]
puts 10000 * (Time.now - start)

If I run this I get:

Matching string1 with re1 takes: 0.17579
Matching string1 with re2 takes: 3.08462
Matching string2 with re1 takes: 0.046959999999999995
Matching string2 with re2 takes: 194.09687

The faster match for string2 with re1 is probably related to some warmup/caching, but it’s obvious that the match with re2 becomes much slower for the longer string.

OOC I tried the same code with perl

#!/usr/bin/env perl

use strict;
use warnings;
use Time::HiRes;

my $string1 = "some line that doesn't match\n" x (1024 * 16);
my $string2 = "some line that doesn't match\n" x (1024 * 1024);

my $re1 = qr/\A([a-zA-Z]+)(?=\()/;
my $re2 = qr/\A([a-zA-Z]+)\(/;

print 'Matching string1 with re1 takes: ';
my $start = Time::HiRes::gettimeofday();
$string1 =~ $re1;
my $end = Time::HiRes::gettimeofday();
print 10000 * ($end - $start), "\n";

print 'Matching string1 with re2 takes: ';
$start = Time::HiRes::gettimeofday();
$string1 =~ $re2;
$end = Time::HiRes::gettimeofday();
print 10000 * ($end - $start), "\n";

print 'Matching string2 with re1 takes: ';
$start = Time::HiRes::gettimeofday();
$string2 =~ $re1;
$end = Time::HiRes::gettimeofday();
print 10000 * ($end - $start), "\n";

print 'Matching string2 with re2 takes: ';
$start = Time::HiRes::gettimeofday();
$string2 =~ $re2;
$end = Time::HiRes::gettimeofday();
print 10000 * ($end - $start), "\n";

… and got a nearly identical result:

Matching string1 with re1 takes: 0.140666961669922
Matching string1 with re2 takes: 3.06844711303711
Matching string2 with re1 takes: 0.03814697265625
Matching string2 with re2 takes: 193.150043487549