PowerShell: PSCustomObject does not work with Select-Object -Unique and Sort-Object -Unique

Prerequisites

Steps to reproduce

Given any collection of PSCustomObjects with different fields, they will not compare equal but still collapse into a single item under Select-Object -Unique or Sort-Object -Unique. Notable cmdlets outputting PSCustomObject collections are:

  • Select-Object itself (-Unique works when used immediately, but not later on)
  • ConvertFrom-Csv

This behavior has been mentioned in https://github.com/PowerShell/PowerShell/issues/15806 and https://github.com/PowerShell/PowerShell/issues/12059 , but neither are for this exact problem, and it affecting Select-Object transformations and ConvertFrom-Csv makes it very prominent.

Expected behavior

> $files = Get-ChildItem | Select-Object Name

> $files
Name
----
dotnet-sdk-5.0.408-linux-arm64
dotnet-sdk-6.0.400-linux-arm64

> $files[0] -eq $files[1]
False

> $files | Select-Object -Unique
Name
----
dotnet-sdk-5.0.408-linux-arm64
dotnet-sdk-6.0.400-linux-arm64

Actual behavior

> $files | Select-Object -Unique
Name
dotnet-sdk-5.0.408-linux-arm64

Error details

No response

Environment data

Name                           Value
----                           -----
PSVersion                      7.3.0-preview.3
PSEdition                      Core
GitCommitId                    7.3.0-preview.3-304-gd02c59addc24e13da3b8ee5e1a8e7aa27e00c745
OS                             Linux 5.15.0-1013-raspi #15-Ubuntu SMP PREEMPT Mon Aug 8 06:33:06 UTC 2022
Platform                       Unix
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0

Visuals

No response

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 41 (17 by maintainers)

Most upvoted comments

@PowerShell/wg-powershell-cmdlets reviewed this and agree that the current behavior for PSCustomObject is incorrect. We believe this is likely a bucket 3 breaking change that users are relying on the current behavior. We considered an option to inform the user to use -Property * if we detect that the objects are PSCustomObjects, however, it seems more useful to fix this behavior that future users can rely upon. So the change is only if the first object is a PSCustomObject and -Property isn’t specified, then it gets set to -Property *.

Looking at the results of the proposed fix, I am not satisfied. While the main scenario is fixed by this fix (usually all objects of the same type in the pipeline), I’m afraid that we’ll immediately get feedback that it doesn’t work if the first object isn’t a PSCustomObject. Since we’re doing a slow accumulation of objects from the pipeline anyway, there’s nothing stopping us from checking their type until we encounter a PSCustomObject.

This in turn makes me think that this fix is more of a workaround and the main problem is somewhere deeper and should be fixed, although it’s not trivial anymore.

@dkaszews No warning. Check only first object being PSCustomObject and Property parameter is not present.

This was discussed in cmdlet working group yesterday. It was suggested that if the objects are PSCustomObjects and no -Property parameter is passed, Select-Object could behave as if Property * had been specified. This will be investigated, which is not (yet) a commitment to make a change.

Can we circle back to @jhoneill 's original solution to simply default -Property to *? It fixes the issue and I cannot see any flows it could break. If nobody can think of a reason that would be a breaking change, I suggest we go with it, as it is trivial to implement.

Alternatively, we could implement it to look at all NoteProperty if and only if the object contains no Property. Not much more complicated and even less likely to break existing flows in case there are some objects which store their real values as Property and use NoteProperty only for metadata.

I think pscustomobject equality and comparison semantics is a key here. We need to describe them exactly and also we need to investigate whether it is possible to change this without breaking other PSObjects.

I cannot reproduce the issue. @iSazonov try this simple one in any directory with multiple files

$files = Get-ChildItem | Select-Object Name
$files  | Select-Object  -unique 
$files  | Select-Object  -unique * 

The second line only returns one item. The third returns all of them. The same happens if you use Sort-Object -unique But if the first line is $files = Get-ChildItem | % name or just $files = Get-ChildItem both lines return all items

I don’t think it is hashing

I think the code ( https://github.com/PowerShell/PowerShell/blob/master/src/Microsoft.PowerShell.Commands.Utility/commands/utility/Select-Object.cs line 628 onwards) compares base objects without note properties - and guess what… PSCustomObjects are ALL note properties -and ONLY looks at note properties if -Properties is specified. So

$a = get-item .
$b = get-item .
$a, $b | Select-Object -unique

Correctly returns ONE item

Add-Member -NotePropertyName "foo" -NotePropertyValue "bar" -InputObject $a
$a, $b | Select-Object -unique

Should make a difference between a and b and return two items, but Select-Object ignores it because it is a note property

 $a, $b | Select-Object -unique name

Correctly only returns one object because it is not told to look at the note property that is different

 $a, $b | Select-Object -unique name,foo 

Now returns two objects because it IS looking at the note property.

This looks to be by design two objects are not considered different if they only differ in note properties, UNLESS those note properties are specified. (Which is why adding * fixes the problem in the initial example). But that design doesn’t allow for PSCustomObjects

<div> GitHub</div><div>PowerShell/Select-Object.cs at master · PowerShell/PowerShell</div><div>PowerShell for every system! Contribute to PowerShell/PowerShell development by creating an account on GitHub.</div>
> $files = Get-ChildItem | Select-Object Name
> $files  | Select-Object * -unique 

Works , but if the second doesn’t have a property parameter it fails. The two linked items suggest that select-object uses the hash code, but this casts doubt on that.

I’ve never understood why -ExcludeProperty doesn’t work unless property is specified, I wonder if both could be solved if Property defaulted to “*”