pydantic: Skip invalid items of collections instead of raising ValidationError

Checks

  • I added a descriptive title to this issue
  • I have searched (google, github) for similar issues and couldn’t find anything
  • I have read and followed the docs and still think this feature/change is needed
  • After submitting this, I commit to one of:
    • Look through open issues and helped at least one other person
    • Hit the “watch” button on this repo to receive notifications and I commit to help at least 2 people that ask questions in the future
    • Implement a Pull Request for a confirmed bug

Feature Request

When parsing data, you may encounter some invalid items in a list. In some cases, it’s not worth to through away all of the data because of just one invalid item.

What I need is to skip invalid items if a list and leave only valid ones.

Sure, this can be done using @validator(pre=True), but this needs to be default behaviour of my models and to accomplish this I will basically have to write custom validator for every single list by hand, which is not very handy.

Related issues

#800

Output of python -c "import pydantic.utils; print(pydantic.utils.version_info())":

             pydantic version: 1.6.1
            pydantic compiled: True
                 install path: /Users/rocky/.local/share/virtualenvs/.../lib/python3.6/site-packages/pydantic
               python version: 3.6.8 (v3.6.8:3c6b436a57, Dec 24 2018, 02:04:31)  [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
                     platform: Darwin-20.1.0-x86_64-i386-64bit
     optional deps. installed: ['typing-extensions']

Here’s how I expect it to work:

from pydantic import BaseModel
from pydantic.types import OptionalList

class Model(BaseModel):
    a: OptionalList[int]

m = Model(a=[1, '2', '?'])
assert m.a == [1, 2]
assert m.__skipped__ == [{'loc': ('a', 2), 'msg': 'value is not a valid integer', 'type': 'type_error.integer'}]
# skipped list should be returned by `validate_model()` just like errors
...

Things to think of:

  • Support ofconlist, conset etc… and interaction with them
  • Default values instead of invalid skipped ones

Another possible syntax:

a: List[int] = Field(skip_invalid=True)
class Config:
    skip_invalid_collections_items = True

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 22
  • Comments: 19 (11 by maintainers)

Most upvoted comments

Here’s my temporary implementation. It works as expected with GenericModel and generates schema identical to typing.List.

from typing import TYPE_CHECKING, Any, List, Optional, TypeVar, cast

from pydantic import BaseModel
from pydantic.fields import ModelField, Undefined
from pydantic.validators import list_validator

if TYPE_CHECKING:
    from pydantic.typing import DictStrAny

ItemT = TypeVar('ItemT')


class LenientList(List[ItemT]):
    """
    LenientList[T] is same as List[T], but will skip invalid items instead of raising error

    At some point this behaviour might be implemented in pydantic:
    https://github.com/samuelcolvin/pydantic/issues/2274
    """

    _item_field: ModelField

    def __class_getitem__(cls, type_):
        """
        Returned type must be subclass of LenientList for validation to work,
        But it also needs to smell like typing.List[T] for pydantic magic to work properly
        """
        if isinstance(type_, tuple):
            (type_,) = type_
        elif hasattr(cls, '_item_field'):
            raise TypeError(f'{cls.__name__} is already concrete')

        if isinstance(type_, TypeVar):
            pydantic_type = type(f'LenientList[{type_}]', (cls,), {})
        else:
            item_field = ModelField.infer(
                name='item',
                value=Undefined,
                annotation=type_,
                class_validators=None,
                config=BaseModel.__config__,
            )
            t_name = getattr(type_, "__name__", None) or type_.__class__.__name__
            pydantic_type = type(f'LenientList[{t_name}]', (cls,), {'_item_field': item_field})

        generic_alias = super().__class_getitem__(type_)
        # this will enable support of GenericModel and json schema
        pydantic_type.__origin__ = generic_alias.__origin__
        pydantic_type.__parameters__ = generic_alias.__parameters__
        pydantic_type.__args__ = generic_alias.__args__
        return pydantic_type

    @classmethod
    def __get_validators__(cls):
        yield cls._list_validator

    @classmethod
    def _list_validator(cls, raw_value: Any, values: 'DictStrAny', field: ModelField) -> Optional[List[ItemT]]:
        if raw_value is None and not field.required:
            return None
        list_value: List[Any] = list_validator(raw_value)
        parsed: List[ItemT] = []
        for item in list_value:
            value, error = cls._item_field.validate(item, values, loc=())
            if error is None:
                parsed.append(cast(ItemT, value))
        return parsed


class Root(BaseModel):
    items: LenientList[int]


print(Root(items=[1, 'noo', 3]))  # items=[1, 3]

Is there any example anywhere of using this on_error='omit' in v2?

In case anyone is waiting for feedback on this, LenientList as implemented here by Bobronium works for me in a complex schema, with pydantic 1.10.2. I don’t know the internals of pydantic enough to give a proper code review though.

If it helps, here are a handful of basic pytest tests I’ve written for my own peace of mind (seeing as LenientList relies on pydantic internals, I want to make sure it doesn’t break at the next pydantic upgrade).

import pytest

from pydantic import BaseModel

from ..lenient_list import LenientList


class InnerElementModel(BaseModel):
    value: int


class ElementModel(BaseModel):
    value: int
    the_list: LenientList[InnerElementModel]


class ContainerModel(BaseModel):
    the_list: LenientList[ElementModel]


def test_LenienList_filters_invalid_elements_and_keeps_valid_ones():
    data = ElementModel(
        value=1,
        the_list = [
            {"value": 1}, {"value": "hello"}, {"value": 3}, {"value": "world"}
        ]
    )

    assert len(data.the_list) == 2
    assert data.the_list[0].value == 1
    assert data.the_list[1].value == 3


def test_LenienList_returns_empty_list_when_no_elements_validate():
    data = ElementModel(
        value=1,
        the_list = [
            {"value": "hello"}, {"value": "world"}
        ]
    )

    assert len(data.the_list) == 0


def test_LenienList_within_LenientList_filters_out_descendent_elements():
    data = ContainerModel(
        the_list = [
            {
                "value": 1,
                "the_list": [{"value": 2}, {"value": "hello"}, {"value": 3}]
            },
        ]
    )

    assert len(data.the_list) == 1
    assert len(data.the_list[0].the_list) == 2
    assert data.the_list[0].the_list[0].value == 2
    assert data.the_list[0].the_list[1].value == 3


def test_LenienList_within_LenientList_keeps_parent_even_if_none_of_the_descendents_validate():
    data = ContainerModel(
        the_list = [
            {
                "value": 1,
                "the_list": [{"value": "nope"}]
            },
        ]
    )

    assert len(data.the_list) == 1
    assert len(data.the_list[0].the_list) == 0

@samuelcolvin that doesn’t expose a way to retrieve the errors afterward does it? I implemented a slight tweak to @adriangb’s very good suggestion here: https://gist.github.com/dmontagu/7f0cef76e5e0e04198dd608ad7219573, this creates a type LenientList which is a subclass of List[T] and adds an attribute called errors that lets you retrieve the validation errors, and has a method with_errors() that returns the items and errors in the order corresponding to validation.

The implementation is a little ugly on the inside, but perhaps someone will find it useful.

Either way, @adriangb’s code is another good example of how to use annotations to create pydantic core schemas, and my gist above is (in my opinion) a good example of how to create a pydantic core schema for a custom generic type.

It’s also supported out of the box in pydantic-core, see a test example here and the schema to configure it here.

I implemented this months ago thinking of this issue, but forgot to reference it and we haven’t added support in python yet.

I have some good news, after thinking about this for a bit I think this not only works in V2 but is also super powerful and re-usable:

from dataclasses import dataclass
from typing import Annotated, Any, Callable, List, TypeVar

from pydantic_core import ValidationError
from pydantic_core import core_schema as cs

_ERROR = object()

@dataclass
class ErorrItemsMarker:
    def __get_pydantic_core_schema__(
        self, source_type: Any, handler: Callable[[Any], cs.CoreSchema]
    ) -> cs.CoreSchema:
        schema = handler(source_type)
        def val(v: Any, handler: cs.ValidatorFunctionWrapHandler) -> Any:
            try:
                return handler(v)
            except ValidationError:
                return _ERROR

        return cs.no_info_wrap_validator_function(val, schema, serialization=schema.get('serialization'))


@dataclass
class ListErrorFilter:
    def __get_pydantic_core_schema__(
        self, source_type: Any, handler: Callable[[Any], cs.CoreSchema]
    ) -> cs.CoreSchema:
        schema = handler(source_type)

        def val(v: List[Any]) -> List[Any]:
            return [item for item in v if item is not _ERROR]

        return cs.no_info_after_validator_function(val, schema, serialization=schema.get('serialization'))


T = TypeVar('T')

LenientList = Annotated[List[Annotated[T, ErorrItemsMarker()]], ListErrorFilter()]


from pydantic import BaseModel


class Model(BaseModel):
    x: LenientList[int]
    y: LenientList[str]
    z: LenientList[List[int]]


print(Model(x=[1, '2', 'c'], y=['a', 'b', 3], z=[['a'], ['1']]))
#> x=[1, 2] y=['a', 'b'] z=[[1]]

This passes all of the tests in https://github.com/pydantic/pydantic/issues/2274#issuecomment-1432916116

Obviously you may want slightly different behavior, but I think this general template can be adapted.

No hurry, some time before the end of March would be great.

A property to collect the errors on LenientList would be very handy, so they can be logged for later inspection.

collecting errors should become possible in V2 with context, see here.


This should be possible in V2 with on_error='omit' but we should make sure it’s actually possible before release.