nmstate: New behavior of `nmstatectl service` breaks backwards compatibility

Describe the bug

As of the merge of https://github.com/nmstate/nmstate/commit/dbb98e1d30c727b5e47ea7daacc8f2936ab5a0e0 the behavior of nmstatectl service is to store a checksum of the applied configs and not reapply configs that have previously been applied.

This is a breaking change as you can no longer evaluate the contents of the .applied file to determine what the current running state should be. The semantics of the behavior of the command have fundamentally changed as well as the contents/format of the files involved in the service. This should not have happened in a patch release (behavior is changed between 2.2.21 and 2.2.22), typically a change of this nature would be implemented in a manner that preserves backwards compatibility and would be in a minor release and removal of backwards compatibility would be noted as a coming deprecation/breaking change to be subsequently included in a major release.

Nmstate version

  • Version 2.2.23

NetworkManager version

  • Version 1.44.0

Platform

  • Platform: rhel 9.3

How reproducible

  • 100%

Steps to Reproduce

  1. create a file /etc/nmstate/somechanges.yml with some changes to be made
  2. run sudo nmstatectl service
  3. observe that:
    1. /etc/nmstate/somechanges.yml is still present unlike in previous versions
    2. /etc/nmstate/somechanges.applied file contains a sha256 digest instead of being a full copy of the applied change as in previous versions
  4. any software integration that depended on the previous documented behavior is now broken in unanticipated ways

Additional context

The motivation for the change is documented as At some envs like openshift machine-config-operator moving configuration files can break in place upgrades which acknowledges that the existing behavior exists and the change fundamentally changes how this command behaves. It is inappropriate and unexpected that a downstream consumer of the nmstate failing to work correctly within the documented behavior of the software would be enough to motivate a breaking change in a patch release that is subsequently pushed out to a stable release of an enterprise OS distribution. @qinqon as an employee of Redhat I’m surprised to see that you signed off on this change

Also, it should be noted that the accepted change does not include any update to the manpage for nmstatectl which still says:

       service
              Apply all network state files ending with .yml in specified( default: /etc/nmstate) folder.   The  applied  network
              state file will be renamed with postfix .applied to prevent repeated applied on next run.

and the changelog fails to not that any breaking changes were included in the release instead declaring the changed behavior as a “bugfix” even though the original behavior was documented and behaved exactly as documented. Typically a “bug” is a behavior that varies from the expected and documented behavior.

About this issue

  • Original URL
  • State: closed
  • Created 5 months ago
  • Comments: 23 (15 by maintainers)

Most upvoted comments

I think the point is not about whether @drawks is OK with #2545 , but maintain API stable.

I will rebase my PR #2546 over #2545.

@drawks Are you seeking nmstate next release to change back to old behavior or expecting us to change the CHANGLOG.md and manpage?

Ideally the new behavior would be something that the user could opt into, the documentation would accurately describe both behaviors and if it is intended that the new behavior is going to be the default behavior at some future release there should be some form of deprecation warning in the docs and/or at runtime.

Sure. I will work on PR for this opt-in.

@drawks we missed to document the change, although the .applied files were an implementation detail to prevent re-apply more than an nmtate interface.

I disagree, it was a documented behavior. As was the fact that all *.yml files in the directory would be applied. With the new change the very basis of the utility of the tool has changed. Let me do a better job of enumarating the problems:

  1. one can no longer be assured that dropping a .yml in the directory and re-running nmstatectl service will apply the file
  2. the contents of the .applied file can no longer be examined to see what state was being declared at the time of the last run
  3. Since all the *.yml' files in the directory will be applied except for those with a .appliedfile that have a matching hash, there is no longer any guarantee that on a given run the files will be additively applied to arrive at a final declared state. e.g. if I have01-rename-interfaces.yml, 02-assemble-mlag-bonds.yml, 03-apply-policy-based-routing.yml` a change made to any of the config files will no longer have been guaranteed to have been applied before subsequent configs are re-applied.

There are other mechanism to check if the state was applied, like checking the nmstatectl show or if the nmstate.service unit was successful, but we are ok going back to old behaviour controller by a flag.

Yes, there are many ways to achieve various outcomes, but… the existing documented behavior is depended upon for integration by myself and presumably others. Changing it in the manner it was changed is a breaking change to the documented and expected behavior.

I’m not saying that the behavior should never change or that the needs and intent of the maintainers should not drive development, but what I am saying is that version numbers should be meaningful and that existing documented behaviors/interfaces should not have changes of this nature made without relaying the change to the end users using the tools of documentation, version numbering, etc.

Right, we will be more careful next time, thanks for reporting.

I guess I can introduce a config in /etc/nmstate/nmstate.conf to use this new hash method:

[service]
remove_applied_yaml = false

Then by default, we still remove the applied YAML file and openshift machine-config-operator could use above config file.

@qinqon Is that OK to you?

Yes, the new behavior should be configurable and it should not be the default behavior.