marshmallow: Nested field with many=True and required=True gives unexpected error for missing data
Say I have following schema definition
from marshmallow import Schema, fields
class ObjectSchema(Schema):
id = fields.Integer(required=True)
name = fields.String(required=True)
class CollectionSchema(Schema):
data = fields.Nested(ObjectSchema, many=True, required=True)
Here’s how this schema works for several invalid inputs
In [6]: CollectionSchema().load({})
Out[6]: UnmarshalResult(data={}, errors={'data': {0: {'name': ['Missing data for required field.'], 'id': ['Missing data for required field.']}}})
In [7]: CollectionSchema().load({'data': []})
Out[7]: UnmarshalResult(data={'data': []}, errors={})
In [8]: CollectionSchema().load({'data': [{}]})
Out[8]: UnmarshalResult(data={'data': [{}]}, errors={'data': {0: {'name': ['Missing data for required field.'], 'id': ['Missing data for required field.']}}})
IMO Out[8] is the only correct validation error from the examples - it correctly describes the problem - all required fields are missing in nested item. For other examples I actually expect something like:
In [6]: CollectionSchema().load({})
Out[6]: UnmarshalResult(data={}, errors={'data': ['Missing data for required field.']})
shows that ‘data’ field is required and missing. I’m missing this one badly.
In [7]: CollectionSchema().load({'data': []})
Out[7]: UnmarshalResult(data={'data': []}, errors={'data': ['Field may not be empty list.']})
shows that field contains empty list - for my usecase it’s the same as missing field. Well, this one is arguable, maybe Nested field requires another parameter to enforce validation errors in case of empty nested value.
So currently marshmallow cannot distinguish between these three cases which complicates automated error handling in JSON api I’m working on now - as I said errors do not describe the problem properly. What do you think - is it an issue or I’m missing something and there’s better approach to this problem?
About this issue
- Original URL
- State: closed
- Created 9 years ago
- Reactions: 2
- Comments: 16 (9 by maintainers)
I was just caught by this while working on https://github.com/marshmallow-code/marshmallow/issues/378 and I also think current behaviour is surprising and this change should be reverted.
I disagree with the rationale presented in https://github.com/marshmallow-code/marshmallow/issues/319#issuecomment-166770421. There are ways to document an API, for instance using apispec to generate an OpenAPI doc. I can’t really picture my clients using this trial and error approach to learn what data is required. (OK, in fact, I’m pretty sure that’s what they do, we all do… But this is wrong.) The error message is not meant to document the API. We don’t expect the clients to send garbage just to get an error message specifying what the server expects. Besides, for those using the trial and error approach, reverting this change will just make it a few more steps.
I’d like to change this in v3.0. Any objection?
To be explicit, I’m talking about case 1 in OP. Regarding case 2, I agree with @sloria’s answer, this is the right behaviour.
Quick comment from my side, since this issue currently has the status “Feedback welcome”: I agree with @chekunkov and @maximkulkin that the current functionality is very surprising/weird. I was expecting
mm.fields.Nested(SomeSchema, many=True, required=True)to give me strict feedback when that field was missing completely. Instead I get error messages about missing data within the list, even though nothing was provided at all for that field.I would strongly prefer if the change from #235 was reverted back to the old, simple behavior (or at least that should be the default behavior). Marshmallow shouldn’t try to be too clever in what it does. If people want this advanced functionality, it should be on an opt-in basis.
Thanks for the thorough report, @chekunkov .
For case 2:
requiredvalidation applies only to missing inputs; empty lists, empty strings, zero, etc. are still considered input. If you want to disallow empty lists, you could do:Case 1 is less clearly cut. #235 was implemented such that nested required fields would be validated if the outer field was missing. @max-orhai, can you comment on the rationale for this decision?
For the time being, you could work around the issue with a custom
Nestedfield:I just proposed a PR that removes the feature: #754.
In general, I think a good approach for the API-design of a validation-library should be to favor strictness and simplicity over convenience. Don’t do any nested validation of data that doesn’t even exist.
It might be a good idea to fall back to the original behavior by default and require an explicit parameter to enable more verbose error messages. I think it is important for error messages to be clear and concise by default.
@max-orhai @deckar01 This functionality is all ways broken.
Consider the case when you have deeply nested JSON structure:
{'foo': {'bar': {'baz': {'bam': {'quux': true}}}}}User sends request{'foo': {'bap': {'baz': {'bam': {'quux': true}}}}}Imagine his confusion when he gets error like that ?!{'foo': {'bar': {'baz': {'bam': {'quux': ['Missing required field']}}}}}Then user starts complaining about API being insane and all ways broken, because the value for ‘quux’ is just there.What he really should get is
{'foo': {'bar': ['Missing required field']}}As with required fields, you could have a bunch of validators that will trigger only if some fields are present (e.g. validating string with regex). Even if user will know after sending request once which fields are required, he will still need to do multiple iterations to figure out what type should particular field be and what are other constraints for it exist.
More of this in #495
thanks for perfect examples and workarounds @sloria !
@max-orhai honestly I don’t think it’s a good idea to overcomplicate “A lightweight library for converting complex objects to and from simple Python datatypes” to handle edge cases like this. maybe it works for your example, when nested field contains required single nested object, but above I provided an example of the case where this advanced deep validation leads to a confusion. by saving some time for client with some particular schema this validation gives ambiguous error responses for others.
what if nested fields is a list, what if it can be empty
{outer: []}? in this case the only valid response for submitted request{extra: True}would benot
or what if outer should be of length 10? following your explanation current validation logic should returns something like
but instead it returns
{'outer': ['Length must be between 10 and 10.']}which doesn’t look consistent with that error example you provided - which once again leads to a confusion.The rationale for _validate_missing is as follows: Consider a web API using a nested schema in which both inner and outer fields are required, so valid JSON data looks like (e.g.) {outerRequired: {innerRequired: true}}.
Now suppose an API client submits a request containing valid data for the outer schema, but missing the key for the inner schema, like {outerRequired: {extra: true}}. Clearly, the client should be informed that the inner schema was missing a required field. That was how Marshmallow worked before #235.
But what if the client submits a request, say {extra: true}, that doesn’t have a key for the outer field? Before #235, the client would have to submit an additional request just to discover the inner required field. After #235, the client can be informed in a single response of everything missing from the request data, now matter how many levels deep.
In short, _validate_missing is an aid to discoverability. I hope that clears things up for you.
On Sun, Nov 8, 2015, at 05:18 PM, Steven Loria wrote:
Links: