dvc: metrics diff --show-md master: 'NoneType' object is not subscriptable
Bug Report
Within master branch:
dvc metrics diff --show-md master
ERROR: unexpected error - 'NoneType' object is not subscriptable
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
This also happens with dvc plots
dvc plots diff --target loss.csv --show-vega master
ERROR: unexpected error - 'NoneType' object is not subscriptable
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
Description
Doing that OP within master throws that error. This is undesired for CI/CD environments.
Reproduce
git clone https://github.com/DavidGOrtega/cml-dvc-basic.git
cd cml-dvc-basic
dvc init
dvc remote add -d myremote s3://your-remote
python get_data.py
dvc add data
dvc push --run-cache
WARNING: Output 'metrics.json'(stage: 'mystage') is missing version info. Cache for it will not be collected. Use `dvc repro` to get your pipeline up to date.
WARNING: Output 'loss.csv'(stage: 'mystage') is missing version info. Cache for it will not be collected. Use `dvc repro` to get your pipeline up to date.
5 files pushed
git add --all
git commit -m 'ci ready'
At this point should suffice for CML looking for doing repro within the CI but unfortunately it does not work
dvc repro --pull
'data.dvc' didn't change, skipping
Running stage 'mystage':
> python train.py
/usr/local/lib/python2.7/dist-packages/sklearn/neural_network/multilayer_perceptron.py:562: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (23) reached and the optimization hasn't converged yet.
% self.max_iter, ConvergenceWarning)
Generating lock file 'dvc.lock'
Updating lock file 'dvc.lock'
To track the changes with git, run:
git add .gitignore dvc.lock
Use `dvc push` to send your updates to remote storage.
dvc metrics diff -v --show-md -v HEAD HEAD
2021-03-28 16:27:10,627 TRACE: Namespace(a_rev='HEAD', all=False, b_rev='HEAD', cd='.', cmd='diff', cprofile=False, cprofile_dump=None, func=<class 'dvc.command.metrics.CmdMetricsDiff'>, instrument=False, instrument_open=False, no_path=False, pdb=False, precision=None, quiet=0, recursive=False, show_json=False, show_md=True, targets=None, verbose=2, version=None)
2021-03-28 16:27:10,963 DEBUG: Check for update is enabled.
2021-03-28 16:27:11,069 TRACE: params.yaml does not exist, it won't be used in parametrization
2021-03-28 16:27:11,070 TRACE: Context during resolution of stage mystage:
{}
2021-03-28 16:27:11,414 TRACE: params.yaml does not exist, it won't be used in parametrization
2021-03-28 16:27:11,415 TRACE: Context during resolution of stage mystage:
{}
2021-03-28 16:27:11,416 DEBUG: Lockfile for 'dvc.yaml' not found
2021-03-28 16:27:11,582 ERROR: unexpected error - 'NoneType' object is not subscriptable
------------------------------------------------------------
Traceback (most recent call last):
File "dvc/main.py", line 50, in main
File "dvc/command/metrics.py", line 149, in run
File "dvc/repo/metrics/__init__.py", line 13, in diff
File "dvc/repo/metrics/diff.py", line 23, in diff
File "dvc/repo/metrics/diff.py", line 8, in _get_metrics
File "dvc/repo/metrics/__init__.py", line 8, in show
File "dvc/repo/__init__.py", line 49, in wrapper
File "dvc/repo/metrics/show.py", line 120, in show
File "dvc/repo/metrics/show.py", line 82, in _read_metrics
File "dvc/utils/serialize/_yaml.py", line 20, in load_yaml
File "dvc/utils/serialize/_common.py", line 50, in _load_data
File "dvc/fs/repo.py", line 143, in open
File "dvc/fs/dvc.py", line 81, in open
File "dvc/objects/db/base.py", line 68, in hash_to_path_info
TypeError: 'NoneType' object is not subscriptable
------------------------------------------------------------
2021-03-28 16:27:12,462 DEBUG: Version info for developers:
DVC version: 2.0.6 (deb)
---------------------------------
Platform: Python 3.7.10 on Linux-4.19.76-linuxkit-x86_64-with-debian-buster-sid
Supports: All remotes
Cache types: hardlink, symlink
Cache directory: fuse.grpcfuse on grpcfuse
Caches: local
Remotes: s3
Workspace directory: fuse.grpcfuse on grpcfuse
Repo: dvc, git
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2021-03-28 16:27:12,473 DEBUG: Analytics is enabled.
2021-03-28 16:27:12,661 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmpjo47u5rj']'
2021-03-28 16:27:12,667 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmpjo47u5rj']'
[18640] Failed to execute script __main__
plots does not work either
dvc plots diff -v --target loss.csv --show-vega HEAD HEAD
2021-03-28 16:38:29,322 DEBUG: Check for update is enabled.
2021-03-28 16:38:29,442 ERROR: 'loss.csv' does not exist.
------------------------------------------------------------
Traceback (most recent call last):
File "dvc/command/plots.py", line 37, in run
File "dvc/command/plots.py", line 80, in _func
File "dvc/repo/plots/__init__.py", line 173, in diff
File "dvc/repo/plots/diff.py", line 16, in diff
File "dvc/repo/plots/__init__.py", line 160, in show
dvc.exceptions.MetricDoesNotExistError: 'loss.csv' does not exist.
------------------------------------------------------------
2021-03-28 16:38:29,450 DEBUG: Analytics is enabled.
2021-03-28 16:38:29,506 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmpjufpxbo2']'
2021-03-28 16:38:29,508 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmpjufpxbo2']'
root@97f8cc35d780:/iterative/cml-dvc-basic# [18685] Failed to execute script __main__
dvc plots diff -v --target loss.csv --show-vega HEAD
2021-03-28 16:38:47,913 DEBUG: Check for update is enabled.
2021-03-28 16:38:48,008 DEBUG: Lockfile for 'dvc.yaml' not found
2021-03-28 16:38:48,332 DEBUG: Lockfile for 'dvc.yaml' not found
2021-03-28 16:38:48,482 ERROR: unexpected error - 'NoneType' object is not subscriptable
------------------------------------------------------------
Traceback (most recent call last):
File "dvc/main.py", line 50, in main
File "dvc/command/plots.py", line 37, in run
File "dvc/command/plots.py", line 80, in _func
File "dvc/repo/plots/__init__.py", line 173, in diff
File "dvc/repo/plots/diff.py", line 16, in diff
File "dvc/repo/plots/__init__.py", line 149, in show
File "dvc/repo/plots/__init__.py", line 86, in collect
File "dvc/fs/repo.py", line 143, in open
File "dvc/fs/dvc.py", line 81, in open
File "dvc/objects/db/base.py", line 68, in hash_to_path_info
TypeError: 'NoneType' object is not subscriptable
------------------------------------------------------------
2021-03-28 16:38:49,413 DEBUG: Version info for developers:
DVC version: 2.0.6 (deb)
---------------------------------
Platform: Python 3.7.10 on Linux-4.19.76-linuxkit-x86_64-with-debian-buster-sid
Supports: All remotes
Cache types: hardlink, symlink
Cache directory: fuse.grpcfuse on grpcfuse
Caches: local
Remotes: s3
Workspace directory: fuse.grpcfuse on grpcfuse
Repo: dvc, git
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2021-03-28 16:38:49,420 DEBUG: Analytics is enabled.
2021-03-28 16:38:49,597 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmpiyh0i0y5']'
2021-03-28 16:38:49,601 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmpiyh0i0y5']'
root@97f8cc35d780:/iterative/cml-dvc-basic# [18702] Failed to execute script __main__
If we add dvc.lock diff will work
git add dvc.lock
git commit -m 'dvc.lock'
dvc metrics diff -v --show-md -v HEAD HEAD
2021-03-28 16:27:48,548 TRACE: Namespace(a_rev='HEAD', all=False, b_rev='HEAD', cd='.', cmd='diff', cprofile=False, cprofile_dump=None, func=<class 'dvc.command.metrics.CmdMetricsDiff'>, instrument=False, instrument_open=False, no_path=False, pdb=False, precision=None, quiet=0, recursive=False, show_json=False, show_md=True, targets=None, verbose=2, version=None)
2021-03-28 16:27:48,864 DEBUG: Check for update is enabled.
2021-03-28 16:27:48,971 TRACE: params.yaml does not exist, it won't be used in parametrization
2021-03-28 16:27:48,972 TRACE: Context during resolution of stage mystage:
{}
2021-03-28 16:27:49,332 TRACE: params.yaml does not exist, it won't be used in parametrization
2021-03-28 16:27:49,333 TRACE: Context during resolution of stage mystage:
{}
2021-03-28 16:27:49,341 TRACE: Assuming '/iterative/cml-dvc-basic/.dvc/cache/be/3703304dfc59f95af156f8a354e68a' is unchanged since it is read-only
| Path | Metric | Old | New | Change |
|--------|----------|-------|-------|----------|
2021-03-28 16:27:49,420 DEBUG: Analytics is enabled.
2021-03-28 16:27:49,507 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmpdzwzsmo5']'
2021-03-28 16:27:49,511 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmpdzwzsmo5']'
[18675] Failed to execute script __main__
Additionally plots does not work with another error
dvc plots diff -v --target loss.csv --show-vega HEAD HEAD
2021-03-28 16:43:30,479 DEBUG: Check for update is enabled.
2021-03-28 16:43:30,593 ERROR: 'loss.csv' does not exist.
------------------------------------------------------------
Traceback (most recent call last):
File "dvc/command/plots.py", line 37, in run
File "dvc/command/plots.py", line 80, in _func
File "dvc/repo/plots/__init__.py", line 173, in diff
File "dvc/repo/plots/diff.py", line 16, in diff
File "dvc/repo/plots/__init__.py", line 160, in show
dvc.exceptions.MetricDoesNotExistError: 'loss.csv' does not exist.
------------------------------------------------------------
2021-03-28 16:43:30,608 DEBUG: Analytics is enabled.
2021-03-28 16:43:30,663 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmp22ehhoy5']'
2021-03-28 16:43:30,665 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmp22ehhoy5']'
root@97f8cc35d780:/iterative/cml-dvc-basic# [18722] Failed to execute script __main__
Expected
No diff message instead of that non descriptive error.
Environment information
$ dvc doctor
DVC version: 2.0.6 (deb)
---------------------------------
Platform: Python 3.7.10 on Linux-4.19.76-linuxkit-x86_64-with-debian-buster-sid
Supports: All remotes
Cache types: hardlink, symlink
Cache directory: fuse.grpcfuse on grpcfuse
Caches: local
Remotes: s3
Workspace directory: fuse.grpcfuse on grpcfuse
Repo: dvc, git
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 24 (15 by maintainers)
My 2 cents - nothing special is required. It should not fail (I assume
'NoneType' object..
is a fail), should show the diff (can be empty) and should return no zero status. @DavidGOrtega might have a different opinion or ideas.Here’s a summary of my understanding from the discussions in this ticket and some talks with those who have been participating:
self-diff
Bug
dvc metrics diff HEAD HEAD
(which outputs an empty table) anddvc plots diff HEAD HEAD
(which throws an error) are inconsistent for self-diffs. Not sure what expected behavior is here, but it should be consistent, and if there is an error, it should be more informative. Unifying the various diff commands is on the dvc roadmap, so while I don’t know when we will get to this, it is well aligned with dvc priorities already.CML impact
@DavidGOrtega I’m curious whether you are seeing this issue in your actual CML pipelines even after
dvc.lock
is committed, since the example you provided in https://github.com/iterative/dvc/issues/5692#issuecomment-811804906 isn’t quite a self-diff:dvc plots diff -v --target loss.csv --show-vega master
This is subtly different from:
dvc plots diff -v --target loss.csv --show-vega HEAD HEAD
In testing, it looks like the first command will work once you have a committed
dvc.lock
, even if there are no differences between the workspace andmaster
(and I would expect this to be a common CML workflow). It’s only when hard-coding the same revision multiple times that an error is thrown. Similarly,metrics diff master
should return actual metrics (instead of the empty table frommetrics diff HEAD HEAD
).dvc.lock
For the general case where there is no
dvc.lock
in one of the revisions, expected behavior fordvc metrics diff
anddvc plots diff
is unclear. Options include:dvc.lock
exists and no data for the revision wheredvc.lock
is missing and a 0 exit code. One question I have for the dvc devs in this scenario is what happens when the metrics/plots are Git-tracked but nodvc.lock
exists or there is a mismatch? Like the self-diffing, this is already well aligned with current dvc priorities.run-cache
Using the run-cache with the Git-committed data to recreate the expected outputs is an interesting idea. As I understand the CML scenario:
dvc.yaml
andtrain.py
anddata.dvc
dependencies are committed and tracked by Git in the master branch.dvc repro
, which populates the run-cache.dvc.lock
can be reconstructed from this).If I’m understanding that right, it’s an interesting idea. This would be a big change (for example, not sure how
always-changed
stages would work) that is not currently prioritized, although we can continue to discuss and might come back to it later.@efiop I have updated the description. Seems that dvc diff can work without dvc.lock commited and dvc plots diff can not be compared with itself?
@DavidGOrtega Could you please add
-v
flag and post full log? That will contain a traceback, which makes looking into issues much easier.related to #5204 #5685 ?