dvc: metrics diff --show-md master: 'NoneType' object is not subscriptable

Bug Report

Within master branch:

dvc metrics diff --show-md master 

ERROR: unexpected error - 'NoneType' object is not subscriptable      

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!

This also happens with dvc plots

dvc plots diff --target loss.csv --show-vega master

ERROR: unexpected error - 'NoneType' object is not subscriptable

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!

Description

Doing that OP within master throws that error. This is undesired for CI/CD environments.

Reproduce

git clone https://github.com/DavidGOrtega/cml-dvc-basic.git
cd cml-dvc-basic
dvc init
dvc remote add -d myremote s3://your-remote
python get_data.py
dvc add data
dvc push --run-cache

WARNING: Output 'metrics.json'(stage: 'mystage') is missing version info. Cache for it will not be collected. Use `dvc repro` to get your pipeline up to date.
WARNING: Output 'loss.csv'(stage: 'mystage') is missing version info. Cache for it will not be collected. Use `dvc repro` to get your pipeline up to date.
5 files pushed  
git add --all
git commit -m 'ci ready' 

At this point should suffice for CML looking for doing repro within the CI but unfortunately it does not work

dvc repro --pull

'data.dvc' didn't change, skipping                                    
Running stage 'mystage':
> python train.py
/usr/local/lib/python2.7/dist-packages/sklearn/neural_network/multilayer_perceptron.py:562: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (23) reached and the optimization hasn't converged yet.
  % self.max_iter, ConvergenceWarning)
Generating lock file 'dvc.lock'                                                                                                    
Updating lock file 'dvc.lock'

To track the changes with git, run:

        git add .gitignore dvc.lock
Use `dvc push` to send your updates to remote storage.
dvc metrics diff -v --show-md -v HEAD HEAD

2021-03-28 16:27:10,627 TRACE: Namespace(a_rev='HEAD', all=False, b_rev='HEAD', cd='.', cmd='diff', cprofile=False, cprofile_dump=None, func=<class 'dvc.command.metrics.CmdMetricsDiff'>, instrument=False, instrument_open=False, no_path=False, pdb=False, precision=None, quiet=0, recursive=False, show_json=False, show_md=True, targets=None, verbose=2, version=None)
2021-03-28 16:27:10,963 DEBUG: Check for update is enabled.
2021-03-28 16:27:11,069 TRACE: params.yaml does not exist, it won't be used in parametrization
2021-03-28 16:27:11,070 TRACE: Context during resolution of stage mystage:
{}
2021-03-28 16:27:11,414 TRACE: params.yaml does not exist, it won't be used in parametrization
2021-03-28 16:27:11,415 TRACE: Context during resolution of stage mystage:
{}
2021-03-28 16:27:11,416 DEBUG: Lockfile for 'dvc.yaml' not found
2021-03-28 16:27:11,582 ERROR: unexpected error - 'NoneType' object is not subscriptable
------------------------------------------------------------
Traceback (most recent call last):
  File "dvc/main.py", line 50, in main
  File "dvc/command/metrics.py", line 149, in run
  File "dvc/repo/metrics/__init__.py", line 13, in diff
  File "dvc/repo/metrics/diff.py", line 23, in diff
  File "dvc/repo/metrics/diff.py", line 8, in _get_metrics
  File "dvc/repo/metrics/__init__.py", line 8, in show
  File "dvc/repo/__init__.py", line 49, in wrapper
  File "dvc/repo/metrics/show.py", line 120, in show
  File "dvc/repo/metrics/show.py", line 82, in _read_metrics
  File "dvc/utils/serialize/_yaml.py", line 20, in load_yaml
  File "dvc/utils/serialize/_common.py", line 50, in _load_data
  File "dvc/fs/repo.py", line 143, in open
  File "dvc/fs/dvc.py", line 81, in open
  File "dvc/objects/db/base.py", line 68, in hash_to_path_info
TypeError: 'NoneType' object is not subscriptable
------------------------------------------------------------
2021-03-28 16:27:12,462 DEBUG: Version info for developers:
DVC version: 2.0.6 (deb)
---------------------------------
Platform: Python 3.7.10 on Linux-4.19.76-linuxkit-x86_64-with-debian-buster-sid
Supports: All remotes
Cache types: hardlink, symlink
Cache directory: fuse.grpcfuse on grpcfuse
Caches: local
Remotes: s3
Workspace directory: fuse.grpcfuse on grpcfuse
Repo: dvc, git

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2021-03-28 16:27:12,473 DEBUG: Analytics is enabled.
2021-03-28 16:27:12,661 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmpjo47u5rj']'
2021-03-28 16:27:12,667 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmpjo47u5rj']'
[18640] Failed to execute script __main__

plots does not work either

dvc plots diff -v --target loss.csv --show-vega HEAD HEAD
2021-03-28 16:38:29,322 DEBUG: Check for update is enabled.
2021-03-28 16:38:29,442 ERROR: 'loss.csv' does not exist.
------------------------------------------------------------
Traceback (most recent call last):
  File "dvc/command/plots.py", line 37, in run
  File "dvc/command/plots.py", line 80, in _func
  File "dvc/repo/plots/__init__.py", line 173, in diff
  File "dvc/repo/plots/diff.py", line 16, in diff
  File "dvc/repo/plots/__init__.py", line 160, in show
dvc.exceptions.MetricDoesNotExistError: 'loss.csv' does not exist.
------------------------------------------------------------
2021-03-28 16:38:29,450 DEBUG: Analytics is enabled.
2021-03-28 16:38:29,506 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmpjufpxbo2']'
2021-03-28 16:38:29,508 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmpjufpxbo2']'
root@97f8cc35d780:/iterative/cml-dvc-basic# [18685] Failed to execute script __main__
dvc plots diff -v --target loss.csv --show-vega HEAD
2021-03-28 16:38:47,913 DEBUG: Check for update is enabled.
2021-03-28 16:38:48,008 DEBUG: Lockfile for 'dvc.yaml' not found
2021-03-28 16:38:48,332 DEBUG: Lockfile for 'dvc.yaml' not found
2021-03-28 16:38:48,482 ERROR: unexpected error - 'NoneType' object is not subscriptable
------------------------------------------------------------
Traceback (most recent call last):
  File "dvc/main.py", line 50, in main
  File "dvc/command/plots.py", line 37, in run
  File "dvc/command/plots.py", line 80, in _func
  File "dvc/repo/plots/__init__.py", line 173, in diff
  File "dvc/repo/plots/diff.py", line 16, in diff
  File "dvc/repo/plots/__init__.py", line 149, in show
  File "dvc/repo/plots/__init__.py", line 86, in collect
  File "dvc/fs/repo.py", line 143, in open
  File "dvc/fs/dvc.py", line 81, in open
  File "dvc/objects/db/base.py", line 68, in hash_to_path_info
TypeError: 'NoneType' object is not subscriptable
------------------------------------------------------------
2021-03-28 16:38:49,413 DEBUG: Version info for developers:
DVC version: 2.0.6 (deb)
---------------------------------
Platform: Python 3.7.10 on Linux-4.19.76-linuxkit-x86_64-with-debian-buster-sid
Supports: All remotes
Cache types: hardlink, symlink
Cache directory: fuse.grpcfuse on grpcfuse
Caches: local
Remotes: s3
Workspace directory: fuse.grpcfuse on grpcfuse
Repo: dvc, git

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2021-03-28 16:38:49,420 DEBUG: Analytics is enabled.
2021-03-28 16:38:49,597 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmpiyh0i0y5']'
2021-03-28 16:38:49,601 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmpiyh0i0y5']'
root@97f8cc35d780:/iterative/cml-dvc-basic# [18702] Failed to execute script __main__

If we add dvc.lock diff will work

git add dvc.lock
git commit -m 'dvc.lock'

dvc metrics diff -v --show-md -v HEAD HEAD
2021-03-28 16:27:48,548 TRACE: Namespace(a_rev='HEAD', all=False, b_rev='HEAD', cd='.', cmd='diff', cprofile=False, cprofile_dump=None, func=<class 'dvc.command.metrics.CmdMetricsDiff'>, instrument=False, instrument_open=False, no_path=False, pdb=False, precision=None, quiet=0, recursive=False, show_json=False, show_md=True, targets=None, verbose=2, version=None)
2021-03-28 16:27:48,864 DEBUG: Check for update is enabled.
2021-03-28 16:27:48,971 TRACE: params.yaml does not exist, it won't be used in parametrization
2021-03-28 16:27:48,972 TRACE: Context during resolution of stage mystage:
{}
2021-03-28 16:27:49,332 TRACE: params.yaml does not exist, it won't be used in parametrization
2021-03-28 16:27:49,333 TRACE: Context during resolution of stage mystage:
{}
2021-03-28 16:27:49,341 TRACE: Assuming '/iterative/cml-dvc-basic/.dvc/cache/be/3703304dfc59f95af156f8a354e68a' is unchanged since it is read-only
| Path   | Metric   | Old   | New   | Change   |
|--------|----------|-------|-------|----------|

2021-03-28 16:27:49,420 DEBUG: Analytics is enabled.
2021-03-28 16:27:49,507 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmpdzwzsmo5']'
2021-03-28 16:27:49,511 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmpdzwzsmo5']'
[18675] Failed to execute script __main__

Additionally plots does not work with another error

dvc plots diff -v --target loss.csv --show-vega HEAD HEAD
2021-03-28 16:43:30,479 DEBUG: Check for update is enabled.
2021-03-28 16:43:30,593 ERROR: 'loss.csv' does not exist.
------------------------------------------------------------
Traceback (most recent call last):
  File "dvc/command/plots.py", line 37, in run
  File "dvc/command/plots.py", line 80, in _func
  File "dvc/repo/plots/__init__.py", line 173, in diff
  File "dvc/repo/plots/diff.py", line 16, in diff
  File "dvc/repo/plots/__init__.py", line 160, in show
dvc.exceptions.MetricDoesNotExistError: 'loss.csv' does not exist.
------------------------------------------------------------
2021-03-28 16:43:30,608 DEBUG: Analytics is enabled.
2021-03-28 16:43:30,663 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmp22ehhoy5']'
2021-03-28 16:43:30,665 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmp22ehhoy5']'
root@97f8cc35d780:/iterative/cml-dvc-basic# [18722] Failed to execute script __main__

Expected

No diff message instead of that non descriptive error.

Environment information

$ dvc doctor

DVC version: 2.0.6 (deb)
---------------------------------
Platform: Python 3.7.10 on Linux-4.19.76-linuxkit-x86_64-with-debian-buster-sid
Supports: All remotes
Cache types: hardlink, symlink
Cache directory: fuse.grpcfuse on grpcfuse
Caches: local
Remotes: s3
Workspace directory: fuse.grpcfuse on grpcfuse
Repo: dvc, git

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 24 (15 by maintainers)

Most upvoted comments

@DavidGOrtega I’m clear on these problems, but I’m less clear on what you need for solutions.

My 2 cents - nothing special is required. It should not fail (I assume 'NoneType' object.. is a fail), should show the diff (can be empty) and should return no zero status. @DavidGOrtega might have a different opinion or ideas.

Here’s a summary of my understanding from the discussions in this ticket and some talks with those who have been participating:

self-diff

Bug

dvc metrics diff HEAD HEAD (which outputs an empty table) and dvc plots diff HEAD HEAD (which throws an error) are inconsistent for self-diffs. Not sure what expected behavior is here, but it should be consistent, and if there is an error, it should be more informative. Unifying the various diff commands is on the dvc roadmap, so while I don’t know when we will get to this, it is well aligned with dvc priorities already.

CML impact

@DavidGOrtega I’m curious whether you are seeing this issue in your actual CML pipelines even after dvc.lock is committed, since the example you provided in https://github.com/iterative/dvc/issues/5692#issuecomment-811804906 isn’t quite a self-diff:

dvc plots diff -v --target loss.csv --show-vega master

This is subtly different from:

dvc plots diff -v --target loss.csv --show-vega HEAD HEAD

In testing, it looks like the first command will work once you have a committed dvc.lock, even if there are no differences between the workspace and master (and I would expect this to be a common CML workflow). It’s only when hard-coding the same revision multiple times that an error is thrown. Similarly, metrics diff master should return actual metrics (instead of the empty table from metrics diff HEAD HEAD).

dvc.lock

For the general case where there is no dvc.lock in one of the revisions, expected behavior for dvc metrics diff and dvc plots diff is unclear. Options include:

  1. Throw an error with a more informative message/hint and return a non-0 exit code (see suggestion in https://github.com/iterative/dvc/issues/5692#issuecomment-811253143).
  2. Show an informative warning instead of error and return no other output and a 0 exit code.
  3. Return a table/plot with the data for the revision where dvc.lock exists and no data for the revision where dvc.lock is missing and a 0 exit code. One question I have for the dvc devs in this scenario is what happens when the metrics/plots are Git-tracked but no dvc.lock exists or there is a mismatch? Like the self-diffing, this is already well aligned with current dvc priorities.

run-cache

Using the run-cache with the Git-committed data to recreate the expected outputs is an interesting idea. As I understand the CML scenario:

  1. The dvc.yaml and train.py and data.dvc dependencies are committed and tracked by Git in the master branch.
  2. CML runs dvc repro, which populates the run-cache.
  3. To get the metrics or other outputs, it should be possible to need only the run-cache and the Git-tracked files in the master branch (all other info that would be found in dvc.lock can be reconstructed from this).

If I’m understanding that right, it’s an interesting idea. This would be a big change (for example, not sure how always-changed stages would work) that is not currently prioritized, although we can continue to discuss and might come back to it later.

@efiop I have updated the description. Seems that dvc diff can work without dvc.lock commited and dvc plots diff can not be compared with itself?

@DavidGOrtega Could you please add -v flag and post full log? That will contain a traceback, which makes looking into issues much easier.

related to #5204 #5685 ?