framework: [10.x] The validator is very slow with large datasets when the key contains asterisks
Laravel Version
10.37.3
PHP Version
8.3.0
Database Driver & Version
No response
Description
As reported on previous Laravel versions, the laravel Validator is very very slow with large datasets:
- https://github.com/laravel/ideas/issues/2212
- https://github.com/laravel/framework/pull/32532
- https://stackoverflow.com/questions/71939857/laravel-validation-large-json-too-slow
- https://stackoverflow.com/questions/67316079/laravel-validation-timesout-for-huge-data
- https://laracasts.com/discuss/channels/requests/request-rules-validation-is-very-slow
I have tried to understand and review the validation process, but it is too complex to be able to send a PR. I could only find that it iterates thousands of times over all values when calculating the validation rules. The problem seems to come only when using asterisks in the fields to validate.
Here a tweet with some tests: https://twitter.com/lito_ordes/status/1735082009666494594
And the repo to test: https://github.com/eusonlito/laravel-validator-slow
Steps To Reproduce
- git clone https://github.com/eusonlito/laravel-validator-slow.git
- cd laravel-validator-slow
- composer install
- php artisan app:slow
Tests Results
"Read Data Time: 0.0120"
"Items: 11351"
"validatorAsterisk Time: 61.9842"
"validatorFixed Prepare: 0.0048"
"validatorFixed Time: 0.1179"
"validatorChunks1000 Time: 2.1746"
"validatorChunks5000 Time: 10.9196"
"validatorChunks10000 Time: 46.9144"
Added memory usage and chunks
"Read Data: 0.0120 seconds / 32.00 memory"
"Items: 11351"
"validatorAsterisk: 60.9496 seconds / 68.50 memory"
"rulesFixed: 0.0049 seconds / 60.50 memory"
"validatorFixed: 0.1193 seconds / 74.00 memory"
"validatorChunks100: 0.3551 seconds / 64.00 memory"
"validatorChunks1000: 2.1089 seconds / 66.00 memory"
"validatorChunks5000: 10.5173 seconds / 70.00 memory"
"validatorChunks10000: 46.5916 seconds / 77.50 memory"
About this issue
- Original URL
- State: closed
- Created 7 months ago
- Reactions: 7
- Comments: 19 (14 by maintainers)
Thank you very much for your reply @driesvints.
In any case, with all due respect to you and your work, to say that the validator of the best PHP framework does not work with a large amount of data (10k rows I don’t know if it can be considered a large amount of data) is to disregard the problem that clearly exists in the code.
The chunk examples show that there really is an n^n problem that in my opinion, should be fixed.
As realistic examples, I’m using the validator to validate CSV and Excel sheets, also to accept product listings in ecommerce import systems, bulk pricing updates, log analysis, etc… Right now I am passing that same validation to a validation of my own, but that does exactly the same job that the framework system should do.
Again, thank you very much for your great work, and I think it would be a good idea to reconsider the decision to close the issue, or at least take a look at it to see what can be done.
Best regards!
Thanks but I just don’t think these are realistic examples, sorry. Sending that much data at once through the validator will slow things down. There are other ways to work with data and validate it. Of course, if you could ever find out ways to improve validation speed, we’d be open to a PR.
Look everyone. For performance improvements we really just appreciate any PR’s you could send. It seems @bastien-phi did just that and already sent in a PR which seems to help. If there’s any additional speed improvements to be made we’d appreciate more PR’s with proper benchmarks.
@tpetry I just don’t feel this is a realistic example sorry. 7K rows for an API response is just too much and should be paginated.
Same for 10K row excel, CSV tables, etc. These should be batched and the validations of these rows should be aggregated in a result set.
I know everyone in this thread feels I’m a bit quick to judge here but it seems we don’t see eye to eye here and just have a difference of opinion. That’s fine and the best thing you can do in this situation is either attempt a PR to improve performance if there’s any bottlenecks or try and find a different way to solve the problem for your use case. We don’t go into performance improvements ourselves as we have other priorities/work of our own. In these situations we appreciate an effort from the community.
I also realise most of you think I’m wrong in this situation but I’d appreciate it if you also respected my opinion even if it doesn’t aligns with your own (mostly referring to people who downvoted me). I appreciate all of you who replied with thoughtful responses here in this thread.
The performance problems aren’t just because of a big file. I have a json object with only 50 items in it, but a lot of fields to be validated and I can see also the same behaviour.
It would be really awesome if you could considerate this and take a look if it can be improved.
Encountered the same problem in the past with an API response that just had 7k rows and needed to be validated to ensure its valid against the spec. The performance was sometimes so slow that the execution was stopped because of
max_execution_time.@driesvints Are these enough realistic example to reopen it and let the community think of a solution? It triggers many people and I guess each one wasted hours to research the issue (in production) and many more hours trying to search for a problem. And in the end abandoning Laravel’s validation functionality for this part.
Having the same issue when validating csv files with a few hundred lines. I feel like there is some room for performance improvements in the validator. I agree with @eusonlito please reconsider this issue as it is a realistic example
ArgumentCountError: array_merge_recursive() does not accept unknown named parameters
Hej Folks,
I’m the guy who figured out the issue with
Arr::dot(). I’m the one who created the poor PR #49380 with a solution.I spent a little time those days solving the second bottleneck and I was able to boil it down to a few lines of code within the
ValidationRuleParser.https://github.com/laravel/framework/blob/10.x/src/Illuminate/Validation/ValidationRuleParser.php#L145-L174
mainly
$rule->compile(...)causes a significant bottleneck.I’ve created a simple benchmark within
ValidationForEachTest.phpIf one of you can figure out how to improve this code, I’ll do my best to contribute all my knowledge. Maybe it is the countless calls of
array_merge_recursive(). I honestly don’t know yet.if we feed
$precompiled = (new ValidationRuleParser($data))->explode($rules);with our ruleset and the data, theValidationRuleParserwill return the compiled array of rules, which was named from the issue opener @eusonlitorulesFixed.If we feed the validator with the precompiled ruleset
new Validator($trans, $data, $compiledRules->rules)it is fast as hell.I learned this trick from the amazing @calebporzio, who did this trick in livewire https://github.com/livewire/livewire/blob/2.x/src/ComponentConcerns/ValidatesInput.php#L217-L238. Maybe we can improve the laravel ValidationRuleParser with his idea.
@driesvints I’m sorry if you feel like downvoting you was disrespectful, it was not the intention. I apologize.
When I downvoted your feedback it was only to show that I indeed disagree, according to the information we have (that it seemed to be an exponential complexity). However I don’t expect you to change your mind based on downvotes and say “We’ll look at it and fix it”, I understand the core team has other priorities.
🙏
to close the circle, @eusonlito, with the improved
\Arr::dot()from the linked PR, validation is still slow and can be improved. But those are the resultsreferencing this PR #49386 here
@333sec can you share your
optimizeValidateRule(), else it’s not exactly very useful advice.Thank you for your help, I couldnt try it from my vacation, but I am really looking forward for a performance improvement in this, as I always have problems with the validator.
EDIT: I took at look at you test and it seems that I cant understand well, what you are trying to test here. Maybe there is a performance bottleneck, when parsing rules, but I think that the performance bottleneck that everyone is complaining about might be on another place.
This is what I have done and performance seems to be a lot better, even when I am using the asterisk, but I am not using the
Rule::forEach()method.I am wondering, if instead calling
array_merge_recursivein the loop, if this would help:Could the array union become a problem?
I think the
array_merge_recursiveis indeed slow and should be avoided inside of loops.