hcsshim: Some computestorage.FormatWritableLayerVhd breakage in master vs 0.9.2?

I was attempting to rebase https://github.com/containerd/containerd/pull/4419 onto https://github.com/microsoft/hcsshim/pull/901 to verify it, and this involves upgrading hcsshim from v0.9.2 to master (+my PR).

I noticed a couple of unrelated test failures in containerd, and so I put together a branch that was just the hcsshim upgrade (specific commit in CI) and it produces the same test failures, which are different failures for the same test in Windows Server 2019 and Windows Server 2022.

The failing test is pkg/os TestResolvePath line 153, and on Windows Serve 2019 it reports:

=== FAIL: pkg/os TestResolvePath (0.50s)
    os_windows_test.go:153: failed to format VHD: failed to attach virtual disk: Incorrect function.

while on Windows Server 2022 it reports

=== FAIL: pkg/os TestResolvePath (0.22s)
    os_windows_test.go:153: failed to format VHD: failed to attach virtual disk: The process cannot access the file because it is being used by another process.

My guess is that there’s some conflict between the RS5 workaround for disk handle vs VHD handle in containerd and the same workaround in hcsshim master which wasn’t present in v0.9.2.

That said, since neither workaround should be active on Windows Server 2022, perhaps there’s something else going on.

I haven’t tested it (I might quickly do so by removing the workaround from containerd), but perhaps the function should be renamed or have its type changed in some way to highlight to callers that the API behaviour has changed and now always expects the VHD Handle, never the disk handle?

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Reactions: 1
  • Comments: 21 (21 by maintainers)

Most upvoted comments

@TBBle Not a bad idea, I was almost thinking for cases like this to make it more flexible on our side for APIs that call it internally, we just have a constant for the build number that will be returned if it’s not manifested. So for computestorage we can grab the build number and if it’s the non-manifested number, we just take the handle as is and hope the caller handled things. wdyt?

#ugh…

Isn’t this fun 😌

@TBBle Reached out to some peeps internally to try and close on this, waiting to hear back. Using the build number for feature detection is generally frowned upon but sometimes it’s the easiest route… so it’d be nice to not require manifesting your application because of leakage of the osversion pkg to others in this repo.

Took me a while, but the discussion I was thinking of was https://github.com/containerd/containerd/pull/6697#discussion_r830393516, for which you already have context. ^_^

@TBBle I still think the easiest route here is finding what the downsides are of swapping osversion to a non GetVersion approach, but we’ll see. Glad this didn’t slip into a released tag though thankfully

I’m mildly concerned about hcsshim now requiring its callers to be manifested when they aren’t using hcsshim/osversion.Build() directly,

I’m not ecstatic about this either šŸ˜” I was hoping we could find a way around or converge on changing osversion to not use GetVersion; I hear conflicting opinions internally on what’s the safe way to version check. My ideal world is anything that has branching behavior based on version differences and is just a library we expose should grab a version number in a different manner, but that might not be the ā€œrightā€ way.

@TBBle It’s likely because containerd isn’t manifested as you mentioned in the end. I believe we’ll always enter the if statement as it’ll return windows 8’s build num. I think doing the same build number check as we do in the test might make sense for this package to avoid anyone using it to manifest, but maybe the right move is to just manifest containerd like we do for docker. Let me think about this