guava: Document, test, and null-annotate for `toImmutableMap`'s behavior when `mergeFunction` returns `null`

API(s)

`com.google.common.collect.ImmutableMap.toImmutableMap(Function<? super T, ? extends K>, Function<? super T, ? extends V>, BinaryOperator<V>)`

How do you want it to be improved?

It’s not entirely clear whether mergeFunction is permitted to return null or not.

According to type annotations it’s not (T extends @Nullable Object, but K and V don’t). Current implementation however (com.google.common.collect.CollectCollectors.toImmutableMap(Function, Function, BinaryOperator) complies with Map.merge(K, V, BiFunction)'s description, that is, it removes value from temporal LinkedHashMap if the merge result is null.

(BTW note that Map.merge takes BiFunction instead of BinaryOperator; in case of Guava, having “just one” V forces both parameters and return type to be annotated together).

Why do we need it to be improved?

Please either explicitly allow nulls or change implementation to throw NPE.

Example

ImmutableMap<Integer, Integer> map = Stream.of(0, 0).collect(ImmutableMap.toImmutableMap(e -> e, e -> e, (a, b) -> null));

Current Behavior

This results in empty map and no exceptions, which may, or may not, be fine.

Desired Behavior

Maybe such code should lead to NPE, although allowing nulls appears to be less surprising for people assuming that merging should behave exactly as described in Map.merge or Collectors.toMap(Function, Function, BinaryOperator).

Concrete Use Cases

I’m unable to provide concrete use cases. I was implementing #6822 for my own purposes and it struck me that I can’t really choose if I should ban nulls entirely or not. I was willing to align with ImmutableMap.toImmutableMap(Function, Function, BinaryOperator) but found that what I’m looking for is unspecified.

Checklist

I agree to follow the code of conduct.
I have read and understood the contribution guidelines.
I have read and understood Guava’s philosophy, and I strongly believe that this proposal aligns with it.

About this issue

Original URL
State: open
Created 8 months ago
Comments: 16 (16 by maintainers)

Commits related to this issue

Document and test how `ImmutableMap.toImmutableMap` behaves when the `mergeFunction` returns `null`. (The test is Google-internal for now because we're in the process of reshuffling our `Collector` t... — committed to google/guava by cpovirk 8 months ago
Document and test how `ImmutableMap.toImmutableMap` behaves when the `mergeFunction` returns `null`. (The test is Google-internal for now because we're in the process of reshuffling our `Collector` t... — committed to google/guava by cpovirk 8 months ago
Document and test how `ImmutableMap.toImmutableMap` behaves when the `mergeFunction` returns `null`. (The test is Google-internal for now because we're in the process of reshuffling our `Collector` t... — committed to google/guava by cpovirk 8 months ago

Most upvoted comments

Thanks also for catching toImmutableTable. I dug up the review, which I had forgotten about. It turns out that the author and I both complained about the JDK’s null behavior, with the author even citing the “if you had three duplicates you’d be back to discarding two and putting in one” behavior that I “discovered” above… 😃

If nulls are used to eliminate duplicates then yes, it is going to be error prone. However, I can imagine some exotic use cases, involving toImmutableTable but also toImmutableRangeMap, which we all need desperately 😉, where you may want to use nulls as a way to avoid storing neutral elements in some collection / table / map (as being uninteresting or for whatever other reason). Merging -x and x could then result in null instead of 0, symmetric difference of two identical sets could result in null instead of empty set and so on. I know that this can be accomplished with post-processing (e.g. streaming, filtering out “zeroes”, recollecting), and that in general it would be less error-prone method than the one involving nulls, but we are where we are.

That still gives us the option to change toImmutableTable, presumably to match toTable. That said, consistency might be less important there: I’d expect users to less commonly migrate from toTable to toImmutableTable than from toMap to toImmutableMap, since toImmutableTable has been available just as long as toTable.

I would love to see consistency between toImmutableTable and toTable. Firstly, it may be useful to some. Furthermore, consistency may be “less important”, but it is still important, I hope 😃 After all, it may be the only quality that that led me - and consequently you - to this issue.

perceptron8 on Nov 9, 2023

Thanks, I’m glad to hear that BiFunction is at least an option. You have found a clever way to show how much more complex that makes the signature, though 😉 (I do seriously think that the complexity may explain the difference between Collectors.toMap, which has complex signature that we don’t want to become yet more complex (so it uses BinaryOperator), and Map.merge, which is relatively straightforward (so it uses BiFunction). But I’m speculating.

I wonder if the sweet spot might be BiFunction<V, V, @Nullable V>: We don’t do the PECS wildcards that explode the length of the type (the way that BiFunction<? super V, ? super V, ? extends @Nullable V> would), but we have enough distinct types that we can put @Nullable where we want it.

Still, this matters relatively rarely, so it’s not necessarily worth going more complex than BinaryOperator.

And I can give some data about how rarely! I ran some tests overnight, and I have a couple different things to report:

My proposal with @NonNull from my earlier post is complex enough that it causes problems…
- …for our Checker Framework users. The problem appears to be only in type inference, which is an area that the Checker Framework owners have been actively working on, so it might go away in time.
- …for some Kotlin experiments that we’re doing. The problem there might also be only type inference, but I’m not going to look into it any more deeply because the Checker Framework issue is already a blocker for doing it now.
We do have several different users who rely on being able to return null from the mergeFunction of toImmutableMap. (I don’t think we have any tests of our own, though… 😃) Most (maybe all?) of them fit into the category of “If we see any duplicates, then drop the key altogether,” which I will note does not actually work if there might be 3, 5, 7, etc. copies of a key, IIUC 😃 So the functionality gets used, which is all the more reason to document and test it.

So I think my plan now is:

Document the null handling.
Test the null handling.
Maybe change the nullness annotations someday if the Checker Framework issue gets fixed.
Maybe or maybe not agonize over BinaryOperator vs. BiFunction if we follow up on https://github.com/google/guava/issues/6822

cpovirk on Nov 8, 2023