clap: clap_derive: Vec/Option behavior is inconsistent with other types

This is actually a summary of https://github.com/TeXitoi/structopt/issues/364

Current behavior

  • Vec<T> and Option<Vec<T>> are not required = true by default.
  • Vec<T> is multiple = true by default which allows not only multiple values (--foo 1 2 3) but also multiple occurrences (--foo 1 --foo 2 3).
  • Option<Vec<T>> additionally allows zero number of values (--foo).

What’s wrong

  • The fact that Vec<T> is not required by default is inconsistent with all the other types in clap_derive that are required by default unless they are wrapped in Option (except bool but it’s a very special case).
  • The fact that Option<Vec<T>> is different from Vec<T>, different not in “not required” sense, confuses newcomers.
  • The fact that Vec<T> allows multiple occurrences along with values is misleading.

Proposal

  • Use min_values = 1 for both Option<Vec<T>> and Vec<T> instead of multiple = true, allowing only non-zero number of values and disallow multiple occurrences (--foo 1 2 but not --foo nor --foo 1 --foo 2). If a user wants to allow zero values or multiple occurrences as well, they can explicitly specify it via min_values = 0 and multiple = true respectively.
  • Use required = true for Vec<T>.

cc @TeXitoi @Dylan-DPC @pksunkara

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 31 (31 by maintainers)

Commits related to this issue

Most upvoted comments

The fact that Vec<T> is not required by default is inconsistent with all the other types in clap_derive that are required by default unless they are wrapped in Option (except bool but it’s a very special case).

I strongly disagree on this. A vector can have 0 elements, and that’s the most used case. If you need at least one, you can add min_values(1) yourself.

I really disagree. Most of the time you’re using a Vec, you just want the list of parameters, and you don’t really care if an empty parameter was provided. Forcing using an option of vec in the most common case would impact a lot the ergonomics.

To clarify, my intent was not to pull #1717 into this but to look at the problem holistically to see how it can affect decisions made here.

If we use min_values(0), an empty vec would represent --foo <no values>, but “no --foo” would be disallowed due to required.

Option<Vec<T>> would represent just what it does currently:

command Vec<T> result Option<Vec<T>> result
app --foo 1 2 [1, 2] Some([1, 2])
app --foo [] Some([])
app ERRROR None

Not yet but I think we should wait until there are users asking for them.

Something like #2993 was recently merged, along with a new derive reference.

Before:

  • bool: a flag
  • Option<_>: not required
  • Option<Option<_>> is not required and when it is present, the value is not required
  • Vec<_>: multiple values, optional
  • Option<Vec<_>>: multiple values, min values of 0, optional

After:

  • bool: a flag
  • Option<_>: not required
  • Option<Option<_>> is not required and when it is present, the value is not required
  • Vec<_>: multiple occurrences, optional
    • optional: Vec implies 0 or more, so should not imply required
  • Option<Vec<_>>: multiple occurrences, optional
    • optional: Use over Vec to detect when no option being present when using multiple values

Motivations:

My priorities were:

  1. Are we getting in the users way?
  2. Does the API make sense?
  3. Does the API encourage best practices?

I was originally concerned about the lack of composability with Option<Option<_>> and Option<Vec<_>> (and eventually Vec<Vec<_>>). It prescribes special meaning to each type depending on where it shows up, rather than providing a single meaning for a type generally. You then can’t do things like have Option<_> mean “required argument with optional value” without hand constructing it. However, in practice the outer type correlates with the argument occurrence and the inner type with the value. It is rare to want the value behavior without also the occurrence behavior. So I figure it is probably fine as long as people can set the flags to manually get the behavior they want.

Vec<_> implies multiple occurrences, rather than multiple values. Anecdotally, whenever I’ve used the old Arg::multiple, I thought I was getting Arg::multiple_occurrences only. Arg::multiple_values, without any bounds or delimiter requirement, can lead to a confusing user experience and isn’t a good default for these. On top of that, if someone does have an unbounded or a delimiter multiple values, they are probably also using multiple occurrences.

Vec<_> is optional because a Vec implies 0 or more, so we stick to the meaning of the rust type. At least for me, I also rarely need a required with multiple occurrences argument but more often need optional with multiple occurrences.

Option<Vec<_>> ends up matching Vec<_> which can raise the question of why have it. Some users might prefer the type. Otherwise, this is so users can detect whether the argument is present or not when using min_values(0). Rather than defining an entire policy around this and having users customize it, or setting min_values(0) without the rest of a default policy, this gives people a blank slate to work from.

Another design option would have been to not infer any special-type settings if someone sets a handful of settings manually, which would have avoided the confusion in Issue #2599 but I see that being confusing (for someone who knows the default, they will be expecting it to be additive; which flags disable inferred settings?) and brittle (as flags are added or changed, how do we ensure we keep this up?).

Tests were added to ensure we support people customizing the behavior to match their needs.

For now, I am leaving off:

  • Vec<Vec<_>>, see #2924
  • (T1, T2), Vec<(T1, T2)>, etc, see #1717
  • Vec<Option<_>> and many other potential combinations

To see this in action, see https://github.com/clap-rs/clap/pull/2993

Whew ok that was a read!

Ever time I see issues pop up around multiple I cringe a little inside. In clap 1 there was only the concept of multipe = true, and I only ever really cared about -f v v -f v v (i.e. both occurrences and values) for no other reason than having not put a lot of thought into it.

Then clap 2 came around and there was confusion when people wanted multiple values, but not multiple occurrences, or vice versa. It was also over-loaded with flags which only occurrences are possible. This spawned things like number_of_values or min_values/max_values. Again, it was super confusing for the subset of people that had these various requirements, but most just adapted and moved on.

Looking back, I wish I’d made the sane default of multiple=true and no other info being -f v -f v is allowed -f v v is not. Which is what I attempted to make the default in 3.x because this was by far the most common request, most people expected -f v -f v (minus people mostly familiar with IRC like CLIs which was a minority). It also no longer required the number_of_values=1, which in all honest looks kind of confusing if you’re not already familiar with clap…“Why am I say I want multiple values, AND also saying number of values is 1?!”

I also wanted to use the 3.x breaking changes to fix this confusion once and for all. I have a much better idea of what is most common. I want the defaults to be the most common case, and allow those who need something different be able to do so.

Unfortunately before I went on hiatus I didn’t finish the docs which I’m sure has lead to even more confusion around the various types of multiples, and their defaults or interactions with each other.

I see multiples as one of those things, “It sounds sooo simple at first, why can’t it just work!?” Until you dig into the details and realize, “Ooooh. Yeah I can totally see how some would want X while others need Y, and you need to distinguish between them sometimes, but not always.” It’s kind of like Strings in that manner; such a simple concept yet so hard to do correctly. …Or CLIs for that matter 😜

Ok, so back to the topic at hand:

I find myself agreeing with @CreepySkeleton here. We’re talking about sane defaults. I can also totally respect @TeXitoi’s sentiment of Option<Vec<T>> is much more of a pain to deal with than Vec<T>. However I disagree that all options should default to 0 values. In fact, I’ve come across very few CLIs that rely on --foo <no value>. They do exist for sure, but most I’ve run into either expect a value, or expect no argument to be used (with or without a default value).

So I think this comes down to balancing Rust ergonomics and semantics (Vec<T>, where all Vecs can have zero values by default) with clap ergonomics and semantics (Option<T> means optional argument, thus Option<Vec<T>> means optional option which accepts multiple values…combined with in clap speak all options by default must have at least 1 value). I think it’s perfectly reasonable to impose clap defaults/semantics here. People are building a struct specifically around claps semantics, so I think that although Vec<T> can totally have zero or more values by default the fact that it does not lining up with claps 1 or more values by default is fine and people will first look to their knowledge about clap semantics when building these structs. This is in part because there is no intrinsic Rust type that is one or more values.

I would suggest simply calling it out in the docs directly, that although the default is Vec<T> means required = true, if you want the ergonomics of Vec<T> just override the required attribute with false

Also, perhaps I’m out of touch, but if I wanted to allow multiple values with occurrences via derive in clap I’d instinctivily try to do Option<Vec<Vec<T>>>

command type
-f v -f v Some([v, v])
-f v v Some([v, v])
-f v v -f v v Some([[v, v], [v, v]])

app -fff from the table would have been parsed as [-f=ff]

Correct. That is intended behaviour due to how parsing is structured. I don’t think there is anything we can, or would want to do about this.

It’s a problem of good documentation more than anything, about the way it conveys this setup to user.

100% agree.


Having said all that, if we settled on Vec<T> defaulting to not-required because Rust ergonomics are more important, it’s not a huge issue to me so long as it’s documented very clearly.

I think that our willingness of changing defaults should depend of what the most widespread usage is in practice.

I don’t agree with this. Maybe all of them tried Option<Vec<T>> first and then decided that it didn’t make a difference so they optimised it back to Vec<T>.

Also, we are not technically breaking this because this a separate library clap and not structopt. I would opt for it easier to understand w.r.t semantics in how to use the types for the fields and being different with each other rather than existing usage.

min_values should always be 0. As you saw, people saw Vec and with semantics they thought min_values might have been zero and that is why they customised the behaviour to be min_values = 1. (Related to #1682)

Note: multiple means multiple occurrences not values according to clap docs

Type required multiple
Vec<T> true false
Option<Vec<T>> false false
Vec<Vec<T>> true true
Option<Vec<Vec<T>>> false true
command Vec<T> Option<Vec<T>> Vec<Vec<T>> Option<Vec<Vec<T>>>
app -f 1 2 [1, 2] Some([1, 2]) [[1, 2]] Some([[1, 2]])
app -f [] Some([]) [[]] Some([[]])
app ERRROR None ERROR None
app -f 1 2 -f 3 4 [3, 4] Some([3, 4]) [[1, 2], [3, 4]] Some([[1, 2], [3, 4]])
app -ff [] Some([]) [[], []] Some([[], []])

By default, if one wants to support a -vvv kind of flag, they would have to use Option<Vec<Vec<bool>>>. But they always have the option of customizing stuff just by describing it using methods like how you already proposed.

#[clap(short, multiple = true, required = false, max_values = 0)]
verbose: Vec<bool>