habitat: hab pkg download file format is insufficiently expressive.

The current file format (package idents separated by newlines) is isn’t expressive enough to easily treat as an independent artifact.

Many origins have different channel names and promotion policies than core. When building a ‘starter list’ of packages, such as we are doing in #6902, we end up wanting to include multiple origins and channels. For example we pretty much always want stable packages from core, but might accept unstable from effortless or other third party origins.

If the wrong channel is used, many packages may not be at the right versions, or even found, and so a list of package idents isn’t really complete without a channel specified. Unfortunately, the channel can only be specified on the command line, and applies to all the files provided. There isn’t a way to specify multiple channels in the same input file, or designate the channel as part of the file.

A similar problem exists for target architecture, again packages might not exist for all architectures, and so the input file is in practice architecture specific as well.

So sharing ‘starter list’ files divorced from the context of channel and target is likely to be error prone, and in some cases tedious, as the download command will need to be run multiple times.

One possibility would be to stay with the simple text file format but expand the ident syntax to allow some sort of qualifier. Something like /ORIGIN/NAME?target=TARGET&channel=CHANNEL might work, but there’s almost certainly a better UX there. We would have a kludgy workaround if we were able to find a fully qualified ident without needing the channel ident but currently we only look in the current channel even if the package is otherwise fully specified (see #7039)

It might make more sense to switch to a human friendly structured text file format. Most likely you would want to represent a list of tuples {target, channel, [package_idents]} or a list of hashes {target: TARGET, channel: CHANNEL, package_idents: [ALL_THE_IDENTS]}.

JSON is quite expressive, but the mainstream varieties don’t have comments. TOML is less expressive (my initial attempt was pretty clunky), but is commonly used in habitat, and allows comments. YAML is popular, expressive and allows comments, but currently isn’t used inside habitat, and might be too much. ASN.1 … just kidding.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 20 (20 by maintainers)

Most upvoted comments

# Target grouping. Channel defaults to stable
[["x86_64-linux"]]
channel = "stable"
packages = ["core/foo", "core/bar/3.2", "core/baz/2.71/20190910181446"]

[["x86_64-linux"]]
channel = "unstable"
packages = ["core/crazypants", "core/cliffs_of_insanity/0.0.1/20191010181446"]

[["x86_64-windows"]]
channel = "stable"
packages = ["effortless/audit-baseline"]

Thumbs up/down if you think this should be the format or not.

@smacfarlane I’m not quite understanding why the packages from the same origin have to come from the same channel, could you explain a bit more? 😃

In the hab pkg bulkupload command, users can use the --channel option to specify the channel to upload the artifacts to (default: unstable).

That means the channels where the artifacts are downloaded from the target Builder specified in hab pkg download are not relevant in the target Builder specified in hab pkg bulkupload.

For your point about the graph conflicts when build, does it only matter in the target builder specified in the hab pkg bulkupload command?

@apriofrost Coming from the build side, consistency in the source channel is important to avoid dependency graph errors on builds. In the context of this flow, I still think mixing channels is a bad idea, and I would also advise users on upload to ensure that the groups they download move together on the consuming end.

After talking to @markan, though I do see the use for mixing channels. When packages in the channel are “leaf” packages, i.e. nothing depends on them and they are, at most, going to be wrapped once, then it’s a relatively safe operation, though still makes me a bit nervous.

Specifically with the core origin on bldr.habitat.sh though, it does feel like we’re setting our users up for a bad time if they download anything but the stable. The exception I can think of would be mirroring for our Biome friends, but it would be important to be able to replicate package->channel mapping.

Some off the cuff thoughts to the above @markan

  1. Strong agree 😺
  2. This feels it could be overly complex, and lead to some unexpected results. Though that may be my bias from build where you want to be very explicit about what you’re doing.
  3. I like this idea, perhaps approaching this from the upload angle of “how do we put humpty-dumpty together again” might allow us to tease out what we’re trying to describe.
  4. Maybe the Builder UI has it backwards, depending on your persona? If I’m managing packages, I would start with my origin. If I’m searching for something though, I’d probably start with where I intend to deploy it. I may not care about Linux, but I am probably interested in a variety of origins.
  5. I agree with channel, that’s an origin level concept that describes part of the “where”. Target is a package level construct that is a “what”, ideally it would be part of our fully qualified ident, but could also be a grouper, i.e. “I want the following packages for my Linux systems”,“and then these for my Windows systems”

My preference would also to not decouple origin from name, as that goes against how we typically describe idents*, and (depending on implementation of course) make 3 harder to implement. I’d think that if we go that route, the only difference between the input and output of download would be the output would contain exclusively fully qualified idents.

  • It occurs to me that perhaps we’re trying to describe different things with what goes into packages. My assumption is that it should be a partially or fully qualified ident, i.e. something you could drop straight into hab pkg install.

Thinking through the comments above, a few thoughts

  1. origin and channel are indeed likely linked, and probably should be taken together

  2. There’s likely enough overlap between packages in different target architectures to make it worthwhile to allow multiple targets. That could be done via the ‘target’ key either always taking an array, or optionally taking an array, or a separate ‘targets’ that takes an array. (I’d lean towards optionally taking an array myself)

  3. We should give some thought to this being usable as the manifest output format; I think we should be able to take a manifest output and use that as input for pkg download, as well as pkg upload

  4. There’s been a lot of discussion around what part of the hierarchy people start with; it sounds like there’s some case for target architecture first. But I wonder if maybe we should start with origin. That’s closer to how we present things in the builder GUI.

  5. Listing target architecture and channel for each package might be a bit too verbose, as there is likely going to be a bunch of packages with the same channel and architecture.

Here’s a slightly different format proposal based on the above:

# simple
version = 1
target = "x86_64-linux"
channel = "stable"
packages = [ "core/foo", "core/bar" ]

# complex
[[core]] # choosing array syntax here, so you can have multiple entries for the same origin
target =  "x86_64-linux"
# omitting channel implies stable
packages = ["foo", "bar/3.2", "baz/2.71/20190910181446"]

[[core]]
target =  ["x86_64-linux", "x86_64-windows"] 
channel = "unstable"
packages = ["crazypants", "cliffs_of_insanity/0.0.1/20191010181446"]

[[effortless]]
target = "x86_64-windows"
channel = "stable"
packages = ["audit-baseline"]

Got it! Thanks.

I personally prefer TOML since that is more habitat ubiquitous and so far we don’t use yaml but I don’t feel incredibly strong about this either. If going with TOML, we could use the target as the table name.