marshmallow: Add option to throw validation error on extra keys
I’m not sure if I’m missing something, but I would like to throw error if extra keys are provided, as in:
>>> class AlbumSchema(Schema):
... title = fields.Str()
>>> AlbumSchema(strict=True).load({'extra': 2}, allow_extra=False)
Traceback (most recent call last):
...
marshmallow.exceptions.ValidationError: {'_schema': ["Extra arguments passed: ['extra']"]}
I’ve been using the following implementation (taken from https://github.com/sloria/webargs/issues/87#issuecomment-183949558):
class BaseSchema(Schema):
@pre_load
def validate_extra(self, in_data):
if not isinstance(in_data, dict):
return
extra_args = [key for key in in_data.keys() if key not in self.fields]
if extra_args:
raise ValidationError('Extra arguments passed: {}'.format(extra_args))
I would expect this to be a common need, however, so could it be supported by the library out-of-the-box?
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Reactions: 6
- Comments: 47 (36 by maintainers)
We still need to decide on a default. I think we should rule out “allow”–it is too permissive for the common use case. That leaves “ignore” and “raise”.
Let’s take a poll:
Add an emoji reaction to this comment with your vote.
😄 = “ignore” - Discard unknown fields but don’t error. This is the current behavior. DRF, Colander, and Valideer have this as the default. 🎉 = “raise” - Raise an error for unknown fields. Voluptuous, Cerberus, schema, and Schematics have this as the default.
+1
In my case it’s to avoid ambiguity. The API expects a set of keys to be present per method and only those keys. We use Marshmallow to power an API and you really want the boundaries of each API method to be crisp. I don’t want anyone to be able to “think”/“guess” that a key is valid because the API accepts it without sending an error.
A different use case is what I got from one of our developers today:
“I’m creating a page with a title and a slug but the API doesn’t accept the slug. It gives no error, it just creates a slug based on the title instead.”
Turns out the user sent the following JSON payload:
But our API wants
page_titleandpage_slug. However, supplying the slug is optional. So it doesn’t complain if the slug isn’t present, it just defaults to creating a slug from the title.The backup option for us is to force
page_slugkey, but letting it beFalseif you want to auto-generate it. Perhaps thats what we’ll go with.So that’s my use case.
As we discussed in my PR, I really lean toward specifying that kind of behavior at instanciation time:
It allows one schema to be used in different contexts, where you might want different behaviors on unknown keys. Also, it seems to me that the implementation would be at least as simple.
Yeah, I think we should move forward with this.
We will need to decide on a default:
We also need to decide on an API. A class Meta option probably makes sense.
Thoughts?
Here’s a more thorough review of this behavior in other validation libraries: https://gist.github.com/sloria/2fac357710f13e855a5b9cf8f05a4da0
@nicktimko Keep the conversation constructive please.
https://marshmallow.readthedocs.io/en/dev/code_of_conduct.html
If someone discovers marshmallow via the github page, the readme instructs them to install the prerelease version without any warning that it is a beta version. We might want to consider making it a little clearer.
https://github.com/marshmallow-code/marshmallow#get-it-now
This is closed by #838 . Thanks everyone for your input. I’ve created follow-up issues for the changes/enhancements discussed in this issue and the PR.
RAISEInput data is not modified, so unknown keys are not removed from it.
IGNOREmight be ambiguous as it could be understood as “don’t validate”, which isPASS. Hence,DROP, as data is being passed but unknown keys are dropped in the process.I like those.
Sure, we could even add support for
unknown=fields.Str()in the future. I don’t think it’s something we should add now, though.There you go: https://github.com/marshmallow-code/marshmallow/pull/838
😉
@sloria I’d be glad to modify my PR so that the three options fit in, if that helps.
Colander has similar options:
https://docs.pylonsproject.org/projects/colander/en/latest/api.html#colander.Mapping
I feel like I have seen this in another serialization library before. I can’t seem to find a reference at the moment.
extraEdit: Found it. https://github.com/alecthomas/voluptuous#extra-dictionary-keys
Another thought:
REMOVEmight be a more clear name thanIGNORE. Users might be confused about the difference betweenALLOWandIGNORE.This change will be applied to Marshmallow v3 (major version) in any case.
It could be backported to Marshmallow v2 with “ignore” as default, but I don’t think this is planned.
@ramnes That would be great! Just mark it as WIP while we decide on a default and finalize the API.
This API makes sense. Currently, Marshmallow only provides
ignoreoption. We could provide those three options.We could even go further and let the user set a default
Fieldwhen usingpreserve/allow, but I’m not sure it is so useful, and it is never too late to add it in a later step.I thought this could use a second look since we are starting to favor strictness by default when it helps detect common developer mistakes. Ignoring extra data is a valid use case, but it can lead to silent data loss.
As general rules, I think: