pydantic: Exponential blowup in errors with recursive discriminated uinion
Initial Checks
- I have searched GitHub for a duplicate issue and I’m sure this is something new
- I have searched Google & StackOverflow for a solution and couldn’t find anything
- I have read and followed the docs and still think this is a bug
- I am confident that the issue is with pydantic (not my code, or another library in the ecosystem like FastAPI or mypy)
Description
Suppose we have a mutually recursive discriminated union with two options A
and B
. They are discriminated by the field type
which is "A"
on A
and "B"
on B
. Both A
and B
have an optional child which is a discriminated union of A and B. If there is a validation error (say a missing field) nested n layers deep and n > 2, then pydantic generates 2^{n-1} - 2 errors. It also takes exponentially long to fail.
The problem is that it considers the discriminator to be refutable and so it backtracks exponentially much and complains that the type
field is wrong tons of times. I would want some way to indicate that the type
field is irrefutable, so if there is a problem don’t try to match it against a different case in the Union
, just fail.
Example Code
The following code prints:
[1, 2, 6, 14, 30, 62, 126, 254, 510, 1022, 2046, 4094, 8190, 16382, 32766]
from pydantic import BaseModel, Field, ValidationError
from typing import Literal, Annotated, Union
Child = Annotated[Union["A", "B"], Field(discriminator="type")]
class B(BaseModel):
type: Literal["B"]
missing_field: str
child: Child | None
class A(BaseModel):
type: Literal["A"]
child: Child | None
for X in [A, B]:
X.update_forward_refs()
def parse_depth(depth):
# missing missing_field ==> parse failure
root = {"type": "B"}
for _ in range(depth):
root = {"type": "A", "child": root}
A.parse_obj(root)
def count_errors(depth):
try:
parse_depth(depth)
except ValidationError as e:
return len(e.errors())
else:
assert False
if __name__ == "__main__":
print([count_errors(n) for n in range(15)])
Python, Pydantic & OS Version
pydantic version: 1.10.4
pydantic compiled: True
install path: /home/hood/Documents/programming/sphinx-js/.tox/py310/lib/python3.10/site-packages/pydantic
python version: 3.10.10 (main, Feb 8 2023, 14:50:01) [GCC 9.4.0]
platform: Linux-5.17.5-76051705-generic-x86_64-with-glibc2.31
optional deps. installed: ['typing-extensions']
Affected Components
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 18 (8 by maintainers)
In pydantic v2, if I remove the
.update_forward_refs()
and changeparse_obj
tomodel_validate
, the original issue code prints[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]
, so I think this has been addressed in v2; given that in principle someone might depend on this behavior in v1, I’m going to close the issue.@hoodmane if this is still an issue for you I’d be happy to look into this further but given its age now and the fact that pydantic v2 is released I’m assuming this is no longer a priority.
Aha that is why getting rid of
| None
made such a dramatic difference. I was worried it would be mad thatNone
has no fieldtype
if we do it this way.Thanks so much @dmontagu!
I think if we “consider discriminators infallible”, it will also massively help with the confusing error messages that come from types with lots of union cases, even when they aren’t recursive 😅 .
@hoodmane I can replicate the issue and understand where it is coming from. I imagine you would still have the performance overhead associated with generating all the errors even if you filter out the errors.
For what it’s worth, I think we should do exactly what you have suggested (i.e., consider discriminators “infallible”) in v2, and plan to implement things that way. (Assuming no objections from @samuelcolvin.)