marshmallow: Add option to throw validation error on extra keys

I’m not sure if I’m missing something, but I would like to throw error if extra keys are provided, as in:

>>> class AlbumSchema(Schema):
...     title = fields.Str()

>>> AlbumSchema(strict=True).load({'extra': 2}, allow_extra=False)
Traceback (most recent call last):
...
marshmallow.exceptions.ValidationError: {'_schema': ["Extra arguments passed: ['extra']"]}

I’ve been using the following implementation (taken from https://github.com/sloria/webargs/issues/87#issuecomment-183949558):

class BaseSchema(Schema):

    @pre_load
    def validate_extra(self, in_data):
        if not isinstance(in_data, dict):
            return

        extra_args = [key for key in in_data.keys() if key not in self.fields]
        if extra_args:
            raise ValidationError('Extra arguments passed: {}'.format(extra_args))

I would expect this to be a common need, however, so could it be supported by the library out-of-the-box?

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 6
  • Comments: 47 (36 by maintainers)

Most upvoted comments

We still need to decide on a default. I think we should rule out “allow”–it is too permissive for the common use case. That leaves “ignore” and “raise”.

Let’s take a poll:

Add an emoji reaction to this comment with your vote.

😄 = “ignore” - Discard unknown fields but don’t error. This is the current behavior. DRF, Colander, and Valideer have this as the default. 🎉 = “raise” - Raise an error for unknown fields. Voluptuous, Cerberus, schema, and Schematics have this as the default.

+1

In my case it’s to avoid ambiguity. The API expects a set of keys to be present per method and only those keys. We use Marshmallow to power an API and you really want the boundaries of each API method to be crisp. I don’t want anyone to be able to “think”/“guess” that a key is valid because the API accepts it without sending an error.

A different use case is what I got from one of our developers today:

“I’m creating a page with a title and a slug but the API doesn’t accept the slug. It gives no error, it just creates a slug based on the title instead.”

Turns out the user sent the following JSON payload:

{
   "page_title": "My Page Title",
   "slug": "a-specific-slug"
}

But our API wants page_title and page_slug. However, supplying the slug is optional. So it doesn’t complain if the slug isn’t present, it just defaults to creating a slug from the title.

The backup option for us is to force page_slug key, but letting it be False if you want to auto-generate it. Perhaps thats what we’ll go with.

So that’s my use case.

As we discussed in my PR, I really lean toward specifying that kind of behavior at instanciation time:

UserSchema(unknown=ALLOW)

It allows one schema to be used in different contexts, where you might want different behaviors on unknown keys. Also, it seems to me that the implementation would be at least as simple.

Yeah, I think we should move forward with this.

We will need to decide on a default:

  • raise: Raise an error for unknown fields (like voluptuous, schema, cerberus, and schematics)
  • remove: Don’t allow unknown fields through, but don’t error (like DRF, colander). This is the current behavior.
  • allow: Allow unknown fields through

We also need to decide on an API. A class Meta option probably makes sense.

from marshmallow import RAISE, REMOVE, ALLOW

class UserSchema(Schema):
    class Meta:
        unknown = ALLOW

Thoughts?

Here’s a more thorough review of this behavior in other validation libraries: https://gist.github.com/sloria/2fac357710f13e855a5b9cf8f05a4da0

@nicktimko Keep the conversation constructive please.

https://marshmallow.readthedocs.io/en/dev/code_of_conduct.html

If someone discovers marshmallow via the github page, the readme instructs them to install the prerelease version without any warning that it is a beta version. We might want to consider making it a little clearer.

https://github.com/marshmallow-code/marshmallow#get-it-now

This is closed by #838 . Thanks everyone for your input. I’ve created follow-up issues for the changes/enhancements discussed in this issue and the PR.

  • #851 - Change default to RAISE
  • #852 - Make the error message configurable
  • #853 - Specify output field

Input data is not modified, so unknown keys are not removed from it. IGNORE might be ambiguous as it could be understood as “don’t validate”, which is PASS. Hence, DROP, as data is being passed but unknown keys are dropped in the process.

I like DROP, RAISE, and what about PASS?

I like those.

BTW, there could be an evolution in the future allowing to specify a default Field to use for unknown keys.

Sure, we could even add support for unknown=fields.Str() in the future. I don’t think it’s something we should add now, though.

@sloria I’d be glad to modify my PR so that the three options fit in, if that helps.

Colander has similar options:

        ``unknown`` controls the behavior of this type when an unknown
        key is encountered in the cstruct passed to the
        ``deserialize`` method of this instance.  All the potential
        values of ``unknown`` are strings.  They are:
        - ``ignore`` means that keys that are not present in the schema
          associated with this type will be ignored during
          deserialization.
        - ``raise`` will cause a :exc:`colander.Invalid` exception to
          be raised when unknown keys are present in the cstruct
          during deserialization.
        - ``preserve`` will preserve the 'raw' unknown keys and values
          in the appstruct returned by deserialization.
        Default: ``ignore``.

https://docs.pylonsproject.org/projects/colander/en/latest/api.html#colander.Mapping

I feel like I have seen this in another serialization library before. I can’t seem to find a reference at the moment.

  • extra
    • “warn” (default)
    • “ignore”
    • “allow”

Edit: Found it. https://github.com/alecthomas/voluptuous#extra-dictionary-keys

Another thought: REMOVE might be a more clear name than IGNORE. Users might be confused about the difference between ALLOW and IGNORE.

This change will be applied to Marshmallow v3 (major version) in any case.

It could be backported to Marshmallow v2 with “ignore” as default, but I don’t think this is planned.

@ramnes That would be great! Just mark it as WIP while we decide on a default and finalize the API.

This API makes sense. Currently, Marshmallow only provides ignore option. We could provide those three options.

We could even go further and let the user set a default Field when using preserve/allow, but I’m not sure it is so useful, and it is never too late to add it in a later step.

I thought this could use a second look since we are starting to favor strictness by default when it helps detect common developer mistakes. Ignoring extra data is a valid use case, but it can lead to silent data loss.

As general rules, I think:

  • We should warn developers when it looks like they have made a mistake
  • Developers should have to opt into destructive operations explicitly