gsutil: Calling `gsutil mv` on a directory/prefix will incorrectly construct destination object path(s) if the destination prefix is a substring of the source prefix
For example
gsutil -m mv gs://bucket/dir/subdir1 gs://bucket/dir/subdir2
sometimes correctly renames to gs://bucket/dir/subdir2
, but other times to gs://bucket/dir/subdir2/subdir1
Coulnd’t pinpoint exactly conditions, but it seems it has nothing to do with trailing slashes. In any case there is nothing about such bahaviour in the docs.
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Reactions: 3
- Comments: 19 (8 by maintainers)
If there’s an existing subdirectory called gs://buciket/dir/subdir2 before you run that mv command it will put subdir1 under subdir2. That is correct behavior - it emulates similar behavior of Unix directory renames. Please see https://cloud.google.com/storage/docs/gsutil/commands/cp#how-names-are-constructed for more details.
@mfschwartz I understand but this was not the case. I tried it again just to make sure.
unfortunately, it’s seems to be hard to reproduce though, I ran this once and it misbehaved, but then 3 times it worked as expected (with different names every time)
gsutil version: 4.19
This is fixed in gsutil v4.43, which is now available in the pypi repo (https://pypi.org/project/gsutil/). We missed the cutoff for this week’s Cloud SDK, but it should be in 265.0.0, scheduled for Tues, Oct 1.
Changed the name of this issue to clarify under which conditions it happens.
There was also a duplicate report of this where a user arrived at the same conclusion in https://issuetracker.google.com/issues/112817360
@thobrla Thanks for acknowledging this. I think the way to solve this is to not use listing at all. Note that my
gsutil mv
command is completely within the GS cloud, no inter-cloud or local.When renaming
test2
totest
, instead of listing the bucket or parent directory, it can check the presence oftest
directly, which I believe is consistent (rather than the eventually consistent listing operation). Done completely in-cloud, this should be as efficient as listing approach.One might ask why would anyone want to rename to a just-deleted directory. We do it for backup rotation:
daily_next
daily_prev
daily
todaily_prev
(which was just deleted)daily_next
todaily
(which, again, just deleted in step 3)While the alternative is to use dated directories, we prefer this approach.
@mfschwartz Please reopen, here’s how to reproduce 100% of the time.
Basically when you rename a folder to something else, then you rename it back to the original name (which should not exist), it treats the original name as “already exists”, therefore creating a subdirectory inside the incorrectly “existing” folder.