dvc: stage: preserve comments and don't add default fields removed by user

md5: adbb969c66c0fe9449b17a57afcd9531
outs:
- cache: true
  md5: c157a79031e1c40f85931829bc5fc552
  metric: false
  path: file
  persist: false
wdir: .

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 15 (12 by maintainers)

Most upvoted comments

It would also be great to maintain the spacing on the ‘cmd’ field. I often change the spacing such a long cmd with several options is split up with new lines in between each argument. DVC currently undoes this spacing

It would be nice if we could also have a custom filed for users to add some metadata.

For my use case in particular, i would like to add the repository and git commit in which the file was generated as well as the username of the person that generated the data. This is because i intend to use the datasets in more than one repository and i need a way to track the origin of the data

https://pypi.org/project/ruamel.yaml/ - this parser or something like this can be used to preserve the info

@AlJohri @martinjms Guys, this feature has been released in 0.40.0, please feel free to give it a try 🙂

So this is done but docs.

Please, don’t forget to update or create an issue for the documentation - probably, this one at least - https://dvc.org/doc/user-guide/dvc-file-format

Just a reminder this now consists of two parts:

  • retain comments and formatting
  • support custom fields under top level meta key

The first part is implemented in this PR https://github.com/iterative/dvc/pull/1885

@Suor Actually, I don’t think we should do any moving automatically, because of potential problems with forwarding compatibility. Imagine we’ve introduced a new key newkey in a new release. One user creates a stage and then shares it with his colleague, that is running an older dvc. In that case, if we would to automatically move newkey to meta/newkey, would lead to bad behaviour, which, if committed back to repo, would break everything even for the first user that was using a new version of dvc. I think throwing a schema error as we do right now is a proper way to handle that (but of course we should probably make the error more user-friendly and suggest user to move that field if it is his own).

Discussed this with @efiop.

Will be implementing superficial approach now. We will return to fundamental approach later in separate ticket when we decide to go after tech debt.

Another topic we discussed was future compatibility issues connected to arbitrary custom keys. Our solution was to allow users to write anything under single top level key meta and disallow in any other places. The tool will automatically move any custom keys there with a warning, as a less annoying behaviour then to just complain-and-stop.