osbuild: osbuild does not produce images with populated dnf state database

Since #328, osbuild has split its software installation into two stages: sources for fetching content using DNF, and rpm for installing them into the target image environment. This would be fine, except… the rpm stage doesn’t use DNF to install.

This is actually a problem, since it means that the generated images now lack the DNF state database information that is used later for providing information to make intelligent decisions with the system software in future transactions. For example, the lack of any state information means that dnf autoremove is fundamentally broken and will always do the wrong thing, since we don’t have packages installed via DNF so that things are marked as user-installed or dep-installed accordingly.

Additionally, if modular content is installed this way, we now have a situation where DNF is broken in the target image because the failsafe mechanism that was requested for RHEL modules will cause DNF to choke since there will be a situation where you have “modular” packages installed without the corresponding module metadata.

Of course, if you’re producing images with no package manager, then this isn’t a problem. Or if you aren’t using modular content, then the damage is limited. But if you’re building custom RHEL 8 images, then this is a problem.

Now, reading back through the history of why this happened, it looks like the goal was to avoid requiring network access for the build stages, presumably to provide a mechanism in which all the inputs could be archived and replayed to generate the same image reproducibly. This is definitely an admirable goal.

My suggestion would be to do the following:

  1. At the sources stage, depsolve for the content you need, and fetch it. Then generate a rpm-md repository.
  2. At the package install stage, configure DNF to use that particular local offline repository you’ve made (with module_hotfixes=1 so modular packages install), and use it to install software as requested, rather than taking the pile of RPMs and doing the installation by hand.

This strategy is actually how offline appliance-tools/livecd-tools and kiwi image builds are often done. You can just make that process automatic with osbuild.

Now this doesn’t solve all the problems, since there’s still the pesky issue of dealing with modular packages. One possible option would be to reposync the module out and merge that into your local repository’s metadata. That would allow it to function the same way it does on a normal system, and have the correct tracking information so that the package manager works properly.

I’m open to ideas here, but the current way osbuild installs software into an image leads to images that potentially won’t work as users expect them to.

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 3
  • Comments: 24 (16 by maintainers)

Commits related to this issue

Most upvoted comments

That seems flawed in practice. It only works as long as all the content you used always remains available. Within the Red Hat ecosystem, this isn’t true on Fedora or CentOS. It’s technically not true on RHEL either if you work with the default repositories. The SUSE ecosystem is a bit better with how they handle service pack/point release updates for SLE and openSUSE Leap, but this still eventually becomes a problem there. And of course openSUSE Tumbleweed is rolling, so…

There is no requirement for osbuild manifests to be valid for longer than necessary. The content-addressed model is used to provide strong guarantees on what data ends up in an image. It is a communication object between osbuild-manifest creators (e.g., osbuild-composer) and the osbuild pipeline engine. The fact that such manifests will be outdated (or have unavailable sources) at one point does not negate their applicability. Obviously, without the updates repository and with just the release repositories the osbuild manifests can be used for much longer. But I do not see why short-lived manifests lead to issues. Manifests are, more often than not, generated on-demand and have no long lifetime whatsoever.

Can you elaborate why you think this is “flawed in practice”?

If manifests are not useful beyond the build process, there is no point in generating them. Full stop. Your existing set of inputs for your build model implies that it’s possible to make reproducible image builds. However, you are (correctly) saying that this is functionally impossible in this ticket.

The way your inputs work essentially mislead users into thinking it’s capable of more than it actually is. If you do not intend to support enforced version locking with reproducible inputs, then don’t include a way to make people think that you can do it. Your thought process about manifests is completely the opposite of how every other system treats them, and so should not exist.

If you’re not willing to use DNF in offline mode to install the requested packages to populate the information correctly, you should at least use dnf mark to simulate the correct setup and mark the user-installed and dep-installed content properly. That will require a bit more work to make sure you figure out what to mark, but it’s doable.

This does not really respond to the situation Tom described, which is that we were told all packages are considered user installed if no DNF metadata is generated. If that is not true, please elaborate.

There is an argument to be made in favor of only marking a selected set of initial packages as user installed. We are aware of that, and we can easily do that by making dnf-json (in osbuild-composer) annotate the RPMs and then add a dnf mark stage to the resulting manifest (quoting Tom: “I’d be happy to discuss that.”).

This is true up to a point. However, the behavior for dnf autoremove is wonky when the DNF database isn’t populated, and users have historically complained about leaves being unexpectedly removed because of this in the past with PackageKit. That’s why we try to make sure the DNF database is correctly populated with Lorax, LiveCD Tools, KIWI, and other image building tools.

I would certainly be interested in a concrete example were the current model of osbuild fails.

I also note, we still don’t have an answer here for modules…

Can you elaborate which particular problems you see?

You mentioned the failsafe mechanism, but we only use the default modules (and none of these have skip_if_unavailable set, right?). Therefore, the failsafe mechanism would only be required if someone explicitly removes the default repositories (to my knowledge, this is not a supported use-case). Once we allow selecting other modules, we will need additional stages. These will use dnf to enable particular repositories, and these will be required to copy the module-metadata into the dnf-database to guarantee it’s available when the repository vanishes for whatever reason.

Similar to the dnf mark issue, I would be very happy if you can provide concrete examples where the current model fails.

My professional interest in OSBuild is only insofar in that I expect it to support modularity properly. My personal interest is in OSBuild to simplify the Fedora image building processes. In both cases, I need both default and non-default modules to work properly for image builds. And that the resulting images aren’t fundamentally broken. Right now, it would be a bad idea to use OSBuild even with default modules, because the resulting image is completely broken for ongoing usage.

Because you install software in the wrong way with OSBuild, there is no way I can trust that my image is any good for production use. If I apply configuration management to a long-running instance from this image, or if I provision a bare metal system from an image built by this system, I would expect package management to work. That will definitely not be the case with RHEL, and may not be the case with Fedora.