dvc: pipeline: show: doesn't show all DAGs in the projects when ran without arguments

As per discussion with @pared we realized that the behavior of dvc pipeline show --ascii contradicts the documentation. Current behavior looks for Dvcfile and fails if it is not found.

The expected behavior of dvc pipeline show --ascii is that it will find all .dvc files in the workspace and plot all pipelines. It does make sense IMHO that if a .dvc is provided then the yielded graph is showing the steps up to the file’s corresponding stage.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 4
  • Comments: 17 (10 by maintainers)

Most upvoted comments

I this comment @efiop was not sure about the behavior of version 0.66. Here is a small example:

Build the following DAG (in a fresh environment with version 0.66 installed):

dvc run -d my_data.txt -o head.txt "head -n 2 my_data.txt > head.txt"
dvc run -d my_data.txt -o tail.txt "tail -n 3 my_data.txt > tail.txt"
dvc run -d tail.txt -o tail_count "wc -l tail.txt > tail_count"
dvc run -d head.txt -o head_count "wc -l head.txt > head_count"

Then:

# dvc pipeline show --ascii head.txt.dvc
Checking for updates...
                 .-----------------.
                 | my_data.txt.dvc |
                 `-----------------'
                 **               **
              ***                   ***
            **                         **
 .--------------.                  .--------------.
 | head.txt.dvc |                  | tail.txt.dvc |
 `--------------'                  `--------------'
         *                                 *
         *                                 *
         *                                 *
.----------------.                .----------------.
| head_count.dvc |                | tail_count.dvc |
`----------------'                `----------------'

Now, if you update to version 0.93 then:

+-----------------+
| my_data.txt.dvc |
+-----------------+
          *
          *
          *
  +--------------+
  | head.txt.dvc |
  +--------------+

is the output of dvc pipeline show --ascii head.txt.dvc.

@mastaer Yes, that line is the cause. But the solution is a bit deeper, and probably would look like making _show show all of those stages when given None as a target. We would also probably need to separate those pipelines with some divider, so it would make more sense. And if we are adopting such behavior for pipeline show, we’ll probably need to do the same for pipeline show --ascii and possibly other flags. Could go about it 1by1, if you would like to contribute a patch 🙂 but ideally need to adopt a common behavior.

i would love this feature 😃

May be I’m confused as well, but I think it’s reasonable to change the behavior to show only the part of the pipeline up to the target. What are the problems we can expect here, @efiop ?

Let’s for the time being, forget about the discrepancy between the actual behavior and the docs and focus on the desired behavior. I also agree that if a target was provided, then the DAG containing this target should be rendered from the target (including it) all the way up to (all) the source(s). Obviously, if the target is a “last” step of the DAG, then (almost, see the next item) the whole DAG is to be rendered.

BTW, this raises a question what to do with “siblings” (a term which is not well defined in a DAG’s context) of a target? I’d suggest sticking to a concise approach as I described above.

If a target is specified and the flag --full is used, then I guess the expected behavior is that the complete DAG containing the target will be rendered.

Another case is when a target is not provided at all. What would be rendered then? I believe it is desired to return all the possible DAGs. Indeed there could be several disconnected DAGs, but still, it is worthy to have them. Maybe, in this case, it won’t be very helpful to actually render them, but having an explicit description of them (like dot) is great. As this is kind of complex case, it could be that

dvc pipeline show

would yield an error saying something like:

If you want to obtain all available DAGs run dvc pipeline show --all

An additional issue which was raised is the Dvcfile and how to handle it. Here I leave everything to you guys as I don’t really understand when and how to use Dvcfiles.