framework: [10.x] The validator is very slow with large datasets when the key contains asterisks

Laravel Version

10.37.3

PHP Version

8.3.0

Database Driver & Version

No response

Description

As reported on previous Laravel versions, the laravel Validator is very very slow with large datasets:

I have tried to understand and review the validation process, but it is too complex to be able to send a PR. I could only find that it iterates thousands of times over all values when calculating the validation rules. The problem seems to come only when using asterisks in the fields to validate.

Here a tweet with some tests: https://twitter.com/lito_ordes/status/1735082009666494594

And the repo to test: https://github.com/eusonlito/laravel-validator-slow

Steps To Reproduce

git clone https://github.com/eusonlito/laravel-validator-slow.git
cd laravel-validator-slow
composer install
php artisan app:slow

Tests Results

"Read Data Time: 0.0120"
"Items: 11351"
"validatorAsterisk Time: 61.9842"
"validatorFixed Prepare: 0.0048"
"validatorFixed Time: 0.1179"
"validatorChunks1000 Time: 2.1746"
"validatorChunks5000 Time: 10.9196"
"validatorChunks10000 Time: 46.9144"

Added memory usage and chunks

"Read Data: 0.0120 seconds / 32.00 memory"
"Items: 11351"
"validatorAsterisk: 60.9496 seconds / 68.50 memory"
"rulesFixed: 0.0049 seconds / 60.50 memory"
"validatorFixed: 0.1193 seconds / 74.00 memory"
"validatorChunks100: 0.3551 seconds / 64.00 memory"
"validatorChunks1000: 2.1089 seconds / 66.00 memory"
"validatorChunks5000: 10.5173 seconds / 70.00 memory"
"validatorChunks10000: 46.5916 seconds / 77.50 memory"

About this issue

Original URL
State: closed
Created 7 months ago
Reactions: 7
Comments: 19 (14 by maintainers)

Most upvoted comments

Thank you very much for your reply @driesvints.

In any case, with all due respect to you and your work, to say that the validator of the best PHP framework does not work with a large amount of data (10k rows I don’t know if it can be considered a large amount of data) is to disregard the problem that clearly exists in the code.

The chunk examples show that there really is an n^n problem that in my opinion, should be fixed.

As realistic examples, I’m using the validator to validate CSV and Excel sheets, also to accept product listings in ecommerce import systems, bulk pricing updates, log analysis, etc… Right now I am passing that same validation to a validation of my own, but that does exactly the same job that the framework system should do.

Again, thank you very much for your great work, and I think it would be a good idea to reconsider the decision to close the issue, or at least take a look at it to see what can be done.

Best regards!

+20

eusonlito on Dec 14, 2023

Thanks but I just don’t think these are realistic examples, sorry. Sending that much data at once through the validator will slow things down. There are other ways to work with data and validate it. Of course, if you could ever find out ways to improve validation speed, we’d be open to a PR.

+17

driesvints on Dec 14, 2023

Look everyone. For performance improvements we really just appreciate any PR’s you could send. It seems @bastien-phi did just that and already sent in a PR which seems to help. If there’s any additional speed improvements to be made we’d appreciate more PR’s with proper benchmarks.

Encountered the same problem in the past with an API response that just had 7k rows and needed to be validated to ensure its valid against the spec. The performance was sometimes so slow that the execution was stopped because of max_execution_time.

@tpetry I just don’t feel this is a realistic example sorry. 7K rows for an API response is just too much and should be paginated.

Same for 10K row excel, CSV tables, etc. These should be batched and the validations of these rows should be aggregated in a result set.

I know everyone in this thread feels I’m a bit quick to judge here but it seems we don’t see eye to eye here and just have a difference of opinion. That’s fine and the best thing you can do in this situation is either attempt a PR to improve performance if there’s any bottlenecks or try and find a different way to solve the problem for your use case. We don’t go into performance improvements ourselves as we have other priorities/work of our own. In these situations we appreciate an effort from the community.

I also realise most of you think I’m wrong in this situation but I’d appreciate it if you also respected my opinion even if it doesn’t aligns with your own (mostly referring to people who downvoted me). I appreciate all of you who replied with thoughtful responses here in this thread.

driesvints on Dec 18, 2023

The performance problems aren’t just because of a big file. I have a json object with only 50 items in it, but a lot of fields to be validated and I can see also the same behaviour.

It would be really awesome if you could considerate this and take a look if it can be improved.

K2ouMais on Dec 14, 2023

Encountered the same problem in the past with an API response that just had 7k rows and needed to be validated to ensure its valid against the spec. The performance was sometimes so slow that the execution was stopped because of max_execution_time.

@driesvints Are these enough realistic example to reopen it and let the community think of a solution? It triggers many people and I guess each one wasted hours to research the issue (in production) and many more hours trying to search for a problem. And in the end abandoning Laravel’s validation functionality for this part.

tpetry on Dec 15, 2023

Having the same issue when validating csv files with a few hundred lines. I feel like there is some room for performance improvements in the validator. I agree with @eusonlito please reconsider this issue as it is a realistic example

bastien-phi on Dec 14, 2023

I am wondering, if instead calling array_merge_recursive in the loop, if this would help:
$mergeArrays = $compiled->implicitAttributes + $this->implicitAttributes + [$attribute => [$key]];

$this->implicitAttributes = array_merge_recursive(...$mergeArrays);
Could the array union become a problem?

ArgumentCountError: array_merge_recursive() does not accept unknown named parameters

comes on Dec 20, 2023

Hej Folks,

I’m the guy who figured out the issue with Arr::dot(). I’m the one who created the poor PR #49380 with a solution.

I spent a little time those days solving the second bottleneck and I was able to boil it down to a few lines of code within the ValidationRuleParser.

https://github.com/laravel/framework/blob/10.x/src/Illuminate/Validation/ValidationRuleParser.php#L145-L174

    protected function explodeWildcardRules($results, $attribute, $rules)
    {
        $pattern = str_replace('\*', '[^\.]*', preg_quote($attribute, '/'));


        $data = ValidationData::initializeAndGatherData($attribute, $this->data);


        foreach ($data as $key => $value) {
            if (Str::startsWith($key, $attribute) || (bool) preg_match('/^'.$pattern.'\z/', $key)) {
                foreach ((array) $rules as $rule) {
                    if ($rule instanceof NestedRules) {
                        $compiled = $rule->compile($key, $value, $data);


                        $this->implicitAttributes = array_merge_recursive(
                            $compiled->implicitAttributes,
                            $this->implicitAttributes,
                            [$attribute => [$key]]
                        );


                        $results = $this->mergeRules($results, $compiled->rules);
                    } else {
                        $this->implicitAttributes[$attribute][] = $key;


                        $results = $this->mergeRules($results, $key, $rule);
                    }
                }
            }
        }


        return $results;
    }

mainly $rule->compile(...) causes a significant bottleneck.

I’ve created a simple benchmark within ValidationForEachTest.php

public function testAsterisk()
    {
        $subset = [];
        for ($i=0; $i < 550; $i++) {
            $subset[] = [
                'id' => $i,
                'name' => 'Name ' . $i,
                'email' => 'email' . $i . '@example.com',
                'date' => now()->addDays($i)->format('Y-m-d'),
                'another_sub_array' => [
                    'street' => 'Street ' . $i,
                    'city' => 'City ' . $i,
                    'country' => 'Country ' . $i,
                ],
            ];
        }

        $data = ['items' => $subset];

        $rules = [
            'items.*' => Rule::forEach(function () {
                return [
                    'id' => 'integer',
                    'name' => ['required', 'string'],
                    'email' => ['required', 'email'],
                    'date' => ['required', 'date'],
                ];
            }),
        ];

        $trans = $this->getIlluminateArrayTranslator();

        // $v = new Validator($trans, $data, $rules);
        $v = new Validator($trans, $data, []);
        $start = microtime(true);
        $res = (new ValidationRuleParser($data))->explode($rules);
        // dd($res);
        dd(microtime(true) - $start);

        $this->assertFalse($v->fails());
    }

If one of you can figure out how to improve this code, I’ll do my best to contribute all my knowledge. Maybe it is the countless calls of array_merge_recursive(). I honestly don’t know yet.

if we feed $precompiled = (new ValidationRuleParser($data))->explode($rules); with our ruleset and the data, the ValidationRuleParser will return the compiled array of rules, which was named from the issue opener @eusonlito rulesFixed.

If we feed the validator with the precompiled ruleset new Validator($trans, $data, $compiledRules->rules) it is fast as hell.

I learned this trick from the amazing @calebporzio, who did this trick in livewire https://github.com/livewire/livewire/blob/2.x/src/ComponentConcerns/ValidatesInput.php#L217-L238. Maybe we can improve the laravel ValidationRuleParser with his idea.

comes on Dec 19, 2023

@driesvints I’m sorry if you feel like downvoting you was disrespectful, it was not the intention. I apologize.

When I downvoted your feedback it was only to show that I indeed disagree, according to the information we have (that it seemed to be an exponential complexity). However I don’t expect you to change your mind based on downvotes and say “We’ll look at it and fix it”, I understand the core team has other priorities.

🙏

AlexandreGerault on Dec 19, 2023

to close the circle, @eusonlito, with the improved \Arr::dot() from the linked PR, validation is still slow and can be improved. But those are the results

"Read Data: 0.0078 seconds / 32.00 memory" // app/Console/Commands/Slow.php:116
"Items: 11351" // app/Console/Commands/Slow.php:36
"validatorAsterisk: 9.1073 seconds / 72.50 memory" // app/Console/Commands/Slow.php:116
"rulesFixed: 0.0034 seconds / 64.50 memory" // app/Console/Commands/Slow.php:116
"validatorFixed: 0.1015 seconds / 78.00 memory" // app/Console/Commands/Slow.php:116
"validatorChunks10: 0.2462 seconds / 68.00 memory" // app/Console/Commands/Slow.php:116
"validatorChunks100: 0.2528 seconds / 68.00 memory" // app/Console/Commands/Slow.php:116
"validatorChunks1000: 0.8292 seconds / 70.00 memory" // app/Console/Commands/Slow.php:116
"validatorChunks5000: 3.1408 seconds / 74.00 memory" // app/Console/Commands/Slow.php:116
"validatorChunks10000: 7.2774 seconds / 81.50 memory" // app/Console/Commands/Slow.php:116```

comes on Dec 15, 2023

referencing this PR #49386 here

comes on Dec 15, 2023

@333sec can you share your optimizeValidateRule(), else it’s not exactly very useful advice.

AJenbo on Apr 20, 2024

I am wondering, if instead calling array_merge_recursive in the loop, if this would help:
$mergeArrays = $compiled->implicitAttributes + $this->implicitAttributes + [$attribute => [$key]];

$this->implicitAttributes = array_merge_recursive(...$mergeArrays);
Could the array union become a problem?
ArgumentCountError: array_merge_recursive() does not accept unknown named parameters

Thank you for your help, I couldnt try it from my vacation, but I am really looking forward for a performance improvement in this, as I always have problems with the validator.

EDIT: I took at look at you test and it seems that I cant understand well, what you are trying to test here. Maybe there is a performance bottleneck, when parsing rules, but I think that the performance bottleneck that everyone is complaining about might be on another place.

This is what I have done and performance seems to be a lot better, even when I am using the asterisk, but I am not using the Rule::forEach() method.

        // $rules = [
        //     'items.*' => Rule::forEach(function () {
        //         return [
        //             'id' => 'integer',
        //             'name' => ['required', 'string'],
        //             'email' => ['required', 'email'],
        //             'date' => ['required', 'date'],
        //         ];
        //     }),
        // ];

        $rules = [
            'items' => ['required', 'array'],
            'items.*.id' => ['integer'],
            'items.*.name' => ['required', 'string'],
            'items.*.email' => ['required', 'email'],
            'items.*.date' => ['required', 'date'],
        ];

K2ouMais on Dec 20, 2023

I am wondering, if instead calling array_merge_recursive in the loop, if this would help:

$mergeArrays = $compiled->implicitAttributes + $this->implicitAttributes + [$attribute => [$key]];

$this->implicitAttributes = array_merge_recursive(...$mergeArrays);

Could the array union become a problem?

K2ouMais on Dec 20, 2023

I think the array_merge_recursive is indeed slow and should be avoided inside of loops.

AlexandreGerault on Dec 20, 2023