dvc: cloud versioning: pushing to a worktree remote hangs

Bug Report

Description

When I push to a worktree remote, I often get stuck in this state for minutes (I have not left it hanging long enough to see if it eventually completes):

Screenshot 2023-01-18 at 8 38 21 AM

Reproduce

I’m not yet sure how to create a minimal example that reproduces it, but it happens often when testing. Here’s the steps I have taken to reproduce it:

  1. Run dvc get dvc get git@github.com:iterative/dataset-registry.git use-cases/cats-dogs to get some data.
  2. Run dvc add cats-dogs to track the data.
  3. Setup a worktree remote in a dvc repo.
  4. dvc push to that worktree remote.

And here’s a video of it:

https://user-images.githubusercontent.com/2308172/213191889-d9ca22d0-608f-4b16-9460-360f21368d53.mov

Output of dvc doctor:

$ dvc doctor
DVC version: 2.41.2.dev36+g5f558d003
---------------------------------
Platform: Python 3.10.2 on macOS-13.1-arm64-arm-64bit
Subprojects:
        dvc_data = 0.33.0
        dvc_objects = 0.17.0
        dvc_render = 0.0.17
        dvc_task = 0.1.11
        dvclive = 1.3.2
        scmrepo = 0.1.6
Supports:
        azure (adlfs = 2022.9.1, knack = 0.9.0, azure-identity = 1.7.1),
        gdrive (pydrive2 = 1.15.0),
        gs (gcsfs = 2022.11.0),
        hdfs (fsspec = 2022.11.0+0.gacad158.dirty, pyarrow = 7.0.0),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.8.3),
        oss (ossfs = 2021.8.0),
        s3 (s3fs = 2022.11.0+6.g804057f, boto3 = 1.24.59),
        ssh (sshfs = 2022.6.0),
        webdav (webdav4 = 0.9.4),
        webdavs (webdav4 = 0.9.4),
        webhdfs (fsspec = 2022.11.0+0.gacad158.dirty)
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s1s1
Caches: local
Remotes: s3
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc, git

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 19 (19 by maintainers)

Most upvoted comments

The updating meta issue should be fixed with dvc-data==0.35.1 (https://github.com/iterative/dvc/pull/8857)

@dberenbaum, can you try running viztracer with --log_async please? See https://viztracer.readthedocs.io/en/latest/concurrency.html#asyncio.

On the latest version, dvc push seems to wrongly think everything is up to date:

https://github.com/iterative/dvc/issues/8852

Edit: There also might be workarounds like https://stackoverflow.com/a/72281706

head-object only works for individual files, and it’s what we already do for querying an individual object

If we just want to check the current version, can we use etags?

This could work, but will probably require some adjustment in dvc-data because we will essentially need to mix non-versioned filesystem calls with versioned ones.

The issue is that when listing a prefix, (which is the only way we can find files we don’t know about) you can either do list_objects_v2 which does not return any version information at all, or list_object_versions which returns all object versions. The current fsspec s3fs implementation always uses list_object_versions when doing any directory listing, since it is the only way for s3fs to make sure it gets version information for everything.

So to get the “etags only” listing from S3 we need a non-versioned s3fs instance (to use list_objects_v2).

Also, this is only a problem on S3. The azure and gcs APIs are designed properly so that you can list only the latest versions of a prefix and get a response that includes version IDs for those latest versions. (Using the etag method won’t improve performance vs using IDs on azure or gcs)

@daavoo For some reason I’m getting an error trying to view this, but here’s the JSON file:

viztracer.zip

Wow I have never seen this 😅 The profile is completely blanket inside checkout for 200s before reaching the sum:

https://user-images.githubusercontent.com/12677733/213715307-1cb5c3ef-1164-4989-9ed9-96f4d0780ccf.mov

Anyhow, looks like you ran before the changes made in #8842 :

Captura de Pantalla 2023-01-20 a las 15 04 12

Could you rerun with the latest version?

@daavoo For some reason I’m getting an error trying to view this, but here’s the JSON file:

I think is because it’s too big. In my computer Firefox breaks but chrome manages to render it