pydantic: Json Encoders are ignored in nested structures

Checks

  • I added a descriptive title to this issue
  • I have searched (google, github) for similar issues and couldn’t find anything
  • I have read and followed the docs and still think this is a bug

Bug

When nesting a model with customer encoders, they are ignored by the parent.

from datetime import datetime, timedelta 
from pydantic import BaseModel 
from pydantic.json import timedelta_isoformat 
 
 
class WithCustomEncoders(BaseModel): 
    dt: datetime 
    diff: timedelta 
 
    class Config: 
        json_encoders = { 
            datetime: lambda v: v.timestamp(), 
            timedelta: timedelta_isoformat, 
        } 
 
class ParentWithoutEncoders(BaseModel): 
    child: WithCustomEncoders
 
m = WithCustomEncoders(dt=datetime(2032, 6, 1), diff=timedelta(hours=100)) 
print(m.json())

p = ParentWithoutEncoders(child=m)
print(p.json())


#> {"dt": 1969671600.0, "diff": "P4DT4H0M0.000000S"}
#> {"child": {"dt": "2032-06-01T00:00:00", "diff": 360000.0}}

Expected output:

#> {"dt": 1969671600.0, "diff": "P4DT4H0M0.000000S"}
#> {"child": {"dt": 1969671600.0, "diff": "P4DT4H0M0.000000S"}}

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 26
  • Comments: 21 (11 by maintainers)

Most upvoted comments

Well I think it was more a omission, the pr wasn’t completed in time for v1.10, it’s too big to be included in a patch release and there won’t be a v1.11. Thus by default the change has to wait for V2.

Remember pydantic isn’t (yet) properly funded and it’s developed and maintained by volunteers.

Thanks for your understanding. I’m working flat out on V2, no one wants it released more than me.

I just ran into this issue… after a while of debugging I realized that my Config wasn’t respected in nested models.

Although for my project it’s OK to simply set a “global” encoder, I can see the value in defining how to encode a specific model within that model’s Config and having that respected even when that model is nested. This is especially true when a module/library exposes public pydantic models for others to use.

Right, but now if I add a datetime field to the parent. Then both would use the same format. I’m looking to be able to define how a particular Model serializes, not all data types within the context.

We can validate, and deserialize per model, but not serialize per model as far I can tell.

from datetime import datetime, timedelta
from pydantic import BaseModel, validator
from pydantic.json import timedelta_isoformat


class WithCustomEncoders(BaseModel):
    dt: datetime
    diff: timedelta

    @validator('dt', pre=True)
    def validate_dt(cls, v):
        try:
            return datetime.fromtimestamp(v)
        except Exception as e:
            raise ValueError('must be a valid timestamp', e)

    class Config:
        json_encoders = {
            datetime: lambda v: v.timestamp(),
            timedelta: timedelta_isoformat,
        }


class ParentWithoutEncoders(BaseModel):
    child: WithCustomEncoders
    p_dt: datetime

    @validator('p_dt', pre=True)
    def validate_p_dt(cls, v):
        try:
            return datetime.fromisoformat(v)
        except Exception as e:
            raise ValueError('must be valid iso string', e)

    class Config:
        json_encoders = {
            datetime: lambda v: v.isoformat()
        }

raw_m = '{"dt": 1969671600.0, "diff": "P4DT4H0M0.000000S"}'
raw_p = '{"child": {"dt": 1969671600.0, "diff": "P4DT4H0M0.000000S"}, "p_dt": "2032-06-01T00:00:00"}'

m = WithCustomEncoders.parse_raw(raw_m)
p = ParentWithoutEncoders.parse_raw(raw_p)

print(m.json())
print(p.json())

assert m.json() == raw_m
assert p.json() == raw_p

# {"dt": 1969671600.0, "diff": "P4DT4H0M0.000000S"}
# {"child": {"dt": "2032-06-01T00:00:00", "diff": 360000.0}, "p_dt": "2032-06-01T00:00:00"}

# Traceback (most recent call last):
#   File "<stdin>", line 50, in <module>
# AssertionError

The only way I currently see is to override the _iter function, i.e.:

class WithCustomEncoders(BaseModel): 
    dt: datetime 
    diff: timedelta 
 
    @validator('dt', pre=True) 
    def validate_dt(cls, v): 
        try: 
            return datetime.fromtimestamp(v) 
        except Exception as e: 
            raise ValueError('must be a valid timestamp', e) 
 
    class Config: 
        json_encoders = { 
            datetime: lambda v: v.timestamp(), 
            timedelta: timedelta_isoformat, 
        } 
         
    def _iter(self, *args, **kwargs): 
        for key, v in super()._iter(*args, **kwargs): 
            if key == 'dt': 
                yield key, v.timestamp() 
            elif key == 'diff': 
                yield key, timedelta_isoformat(v) 
            else: 
                yield key, v

Edit: _iter(...) if it should also apply to dict(...) otherwise overriding json(...) in a similar fashion would work as well if it should only apply to json dumping.

It seems to me that the specific problems brought up in this issue can mostly be resolved now through the use of the new @model_serializer and @field_serializer in v2. We have actually removed the Config.json_encoders thing in v2 in favor of this approach, but if you run into problems caused by the removal of json_encoders, please open a new issue about it and we’ll be happy to attempt to resolve your issues and/or come up with a way to further ease migration.

I’ve changed my mind, @lilyminium you PR looks great, needs some fixes, but I’d be happy to accept it for v1.10.

I opened a PR at #3941 addressing this with a keyword argument (use_nested_encoders). It’s a fairly simple band-aid onto the issue and probably the top flaw is that it does not work when models_as_dict=False, because that just passes raw data into json.dumps with a global encoder. Would love to get feedback, or continue discussing here!

I agree with @mvanderlee that it’s not clear at first that the parent model that is converting to JSON establishes all the encoders and the nested models are converted using those. This appears to be because the whole object is converted into a dict and then the JSON encoder of the parent model is applied to the dictionary. An alternative might be to recursively call the JSON encoder on any pydantic BaseModel children and incorporate along the way.

@mvanderlee if you’re interested in a PR, you might look at modifying https://github.com/samuelcolvin/pydantic/blob/13a5c7d676167b415080de5e6e6a74bea095b239/pydantic/main.py#L474-L509

This is likely to be much slower than the single call to JSON encoder, so I’d recommend making it an option or even a new method?

Agreed with the other folks here that I can see why the current behavior is implemented the way it is, but there may be use-cases where you might want to serialize different components of the same type differently.

Perhaps I hadn’t been looking at the right parts of the documentation, but I didn’t really understand the Config inheritance until I stumbled upon this issue and saw @PrettyWood’s explanation. Would it be possible to add some of this discussion to the documentation if it’s not already there?

In the interim, based on @PrettyWood’s example, what I came up with was just to create subclasses of the nested object (maybe this is overkill, would appreciate any suggestions!). Using a similar example:

class DatetimeA(datetime.datetime):
    """Datetime that will be encoded as a string in pydantic Config."""
    def __new__(cls, *args, **kwargs):
        return datetime.datetime.__new__(cls, *args, **kwargs)

class DatetimeB(datetime.datetime):
    """Datetime that will be encoded as a timestamp in pydantic Config."""
    def __new__(cls, *args, **kwargs):
        return datetime.datetime.__new__(cls, *args, **kwargs)

class MyBaseModel(BaseModel):
    class Config:
        json_encoders = {
            DatetimeA: lambda v: str(v),
            DatetimeB: lambda v: v.timestamp()
    }

class TestModel(MyBaseModel):
    datetime_a: DatetimeA
    datetime_b: DatetimeB

class ParentModel(MyBaseModel):
    test_model: TestModel

test_model = TestModel(
    datetime_a=DatetimeA(2020, 1, 1),
    datetime_b=DatetimeB(2015, 1, 1)
)

parent_model = ParentModel(
    test_model=test_model
)

print(test_model.json())
print(parent_model.json())

Output:

# '{"datetime_a": "2020-01-01 00:00:00", "datetime_b": 1420088400.0}'
# '{"test_model": {"datetime_a": "2020-01-01 00:00:00", "datetime_b": 1420088400.0}}'

Sure thing!

from datetime import datetime, timedelta
from pydantic import BaseModel as PydanticBaseModel
from pydantic.json import timedelta_isoformat


class BaseModel(PydanticBaseModel):
    """
    All the instances of BaseModel should serialize
    those types the same way
    """
    class Config:
        json_encoders = {
            datetime: lambda v: v.timestamp(),
            timedelta: timedelta_isoformat,
        }


class WithCustomEncoders(BaseModel):
    dt: datetime
    diff: timedelta


class ParentWithoutEncoders(BaseModel):
    child: WithCustomEncoders


m = WithCustomEncoders(dt=datetime(2032, 6, 1), diff=timedelta(hours=100))
print(m.json())
# {"dt": 1969653600.0, "diff": "P4DT4H0M0.000000S"}

p = ParentWithoutEncoders(child=m)
print(p.json())
# {"child": {"dt": 1969653600.0, "diff": "P4DT4H0M0.000000S"}}

The magic is possible because Config of a subclass inherits from the Config of its parent!

You can also change default value of BaseConfig, which all Config classes inherit from

from datetime import datetime, timedelta
from pydantic import BaseConfig, BaseModel
from pydantic.json import timedelta_isoformat

# All the instances of BaseModel should serialize those types the same way
BaseConfig.json_encoders = {
    datetime: lambda v: v.timestamp(),
    timedelta: timedelta_isoformat,
}


class WithCustomEncoders(BaseModel):
    dt: datetime
    diff: timedelta


class ParentWithoutEncoders(BaseModel):
    child: WithCustomEncoders


m = WithCustomEncoders(dt=datetime(2032, 6, 1), diff=timedelta(hours=100))
print(m.json())
# {"dt": 1969653600.0, "diff": "P4DT4H0M0.000000S"}

p = ParentWithoutEncoders(child=m)
print(p.json())
# {"child": {"dt": 1969653600.0, "diff": "P4DT4H0M0.000000S"}}

Hope it helps 😃

Duplicating the config is a non-starter for us. So you’re saying that there is currently no way to ensure a Model will always be serialized a certain way? Other than manually defining a json encoder and always specifying it on json.dump. Which again, would be a non-started for us.

I need to allow Models to be 2-way serializable.

You mentioned a custom BaseModel, do you have an example?

Well I think it was more a omission, the pr wasn’t completed in time for v1.10, it’s too big to be included in a patch release and there won’t be a v1.11. Thus by default the change has to wait for V2.

Remember pydantic isn’t (yet) properly funded and it’s developed and maintained by volunteers.

Thanks for your understanding. I’m working flat out on V2, no one wants it released more than me.

I really do appreciate the work by the way, and the library. I was just disappointed to google my issue, find a branch that was merged 1 month prior to v1.10 that fixed it, but wasn’t released in 1.10, wasting 6 hours trying to find a workaround because Pydantic doesn’t listen to nested encoders and dealing with circular references has been a pain, but I really appreciate the work you do! Keep it up! Sorry about the tone. Just upset that it didn’t make it in V1.10.

Just to clarify for a passing-by reader: this doesn’t seem to be solved. The PR above was initially accepted and then reverted. Now the problems waits till the V2 release.