habitat: hab pkg download file format is insufficiently expressive.
The current file format (package idents separated by newlines) is isn’t expressive enough to easily treat as an independent artifact.
Many origins have different channel names and promotion policies than core. When building a ‘starter list’ of packages, such as we are doing in #6902, we end up wanting to include multiple origins and channels. For example we pretty much always want stable packages from core, but might accept unstable from effortless or other third party origins.
If the wrong channel is used, many packages may not be at the right versions, or even found, and so a list of package idents isn’t really complete without a channel specified. Unfortunately, the channel can only be specified on the command line, and applies to all the files provided. There isn’t a way to specify multiple channels in the same input file, or designate the channel as part of the file.
A similar problem exists for target architecture, again packages might not exist for all architectures, and so the input file is in practice architecture specific as well.
So sharing ‘starter list’ files divorced from the context of channel and target is likely to be error prone, and in some cases tedious, as the download command will need to be run multiple times.
One possibility would be to stay with the simple text file format but expand the ident syntax to allow some sort of qualifier. Something like /ORIGIN/NAME?target=TARGET&channel=CHANNEL might work, but there’s almost certainly a better UX there. We would have a kludgy workaround if we were able to find a fully qualified ident without needing the channel ident but currently we only look in the current channel even if the package is otherwise fully specified (see #7039)
It might make more sense to switch to a human friendly structured text file format. Most likely you would want to represent a list of tuples {target, channel, [package_idents]} or a list of hashes {target: TARGET, channel: CHANNEL, package_idents: [ALL_THE_IDENTS]}.
JSON is quite expressive, but the mainstream varieties don’t have comments. TOML is less expressive (my initial attempt was pretty clunky), but is commonly used in habitat, and allows comments. YAML is popular, expressive and allows comments, but currently isn’t used inside habitat, and might be too much. ASN.1 … just kidding.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 20 (20 by maintainers)
Thumbs up/down if you think this should be the format or not.
@apriofrost Coming from the build side, consistency in the source channel is important to avoid dependency graph errors on builds. In the context of this flow, I still think mixing channels is a bad idea, and I would also advise users on upload to ensure that the groups they download move together on the consuming end.
After talking to @markan, though I do see the use for mixing channels. When packages in the channel are “leaf” packages, i.e. nothing depends on them and they are, at most, going to be wrapped once, then it’s a relatively safe operation, though still makes me a bit nervous.
Specifically with the
coreorigin on bldr.habitat.sh though, it does feel like we’re setting our users up for a bad time if they download anything but the stable. The exception I can think of would be mirroring for our Biome friends, but it would be important to be able to replicate package->channel mapping.Some off the cuff thoughts to the above @markan
My preference would also to not decouple origin from name, as that goes against how we typically describe idents*, and (depending on implementation of course) make 3 harder to implement. I’d think that if we go that route, the only difference between the input and output of
downloadwould be the output would contain exclusively fully qualified idents.packages. My assumption is that it should be a partially or fully qualified ident, i.e. something you could drop straight intohab pkg install.Thinking through the comments above, a few thoughts
origin and channel are indeed likely linked, and probably should be taken together
There’s likely enough overlap between packages in different target architectures to make it worthwhile to allow multiple targets. That could be done via the ‘target’ key either always taking an array, or optionally taking an array, or a separate ‘targets’ that takes an array. (I’d lean towards optionally taking an array myself)
We should give some thought to this being usable as the manifest output format; I think we should be able to take a manifest output and use that as input for
pkg download, as well aspkg uploadThere’s been a lot of discussion around what part of the hierarchy people start with; it sounds like there’s some case for target architecture first. But I wonder if maybe we should start with origin. That’s closer to how we present things in the builder GUI.
Listing target architecture and channel for each package might be a bit too verbose, as there is likely going to be a bunch of packages with the same channel and architecture.
Here’s a slightly different format proposal based on the above:
Got it! Thanks.
I personally prefer TOML since that is more habitat ubiquitous and so far we don’t use yaml but I don’t feel incredibly strong about this either. If going with TOML, we could use the target as the table name.