s3fs: Unable to create a directory
What happened: Trying to create a directory in a bucket and it silently fails.
What you expected to happen: Expect a key created in a bucket.
Minimal Complete Verifiable Example:
from s3fs import S3FileSystem
s3 = S3FileSystem(anon=False)
s3.makedirs('s3://my-bucket-name/testdir', exist_ok=False) #fails to create
s3.makedirs('my-bucket-name/testdir', exist_ok=False) #fails to create
How do I create a directory in a bucket? Am I doing something wrong or is it a bug? I see conversation in #245 and I was wondering if you could please help explain what the intended behavior is.
Anything else we need to know?: s3fs version == 0.5.0 Relevant issue here: #245
Environment:
- Dask version:
- Python version: 3.7.7
- Operating System: Ubuntu Linux
- Install method (conda, pip, source): conda
About this issue
- Original URL
- State: open
- Created 4 years ago
- Comments: 19 (9 by maintainers)
This is a long-standing misunderstanding, and your help, @srib , would be greatly appreciated in clarifying the situation for other users.
s3 does not support directories, except the top-level buckets, which are fairly similar in their behaviour. If you expect
s3.makedirs('my-bucket-name/testdir')to enable you to write files with paths that start with that string then: congratulations! There is in fact nothing for s3fs to do, the location was already writable.Once you have a file like “my-bucket-name/testdir/testfile”, the directory will appear to exist, without that file it will not. This is similar to trying to commit an empty directory to git.
Unfortunately, the situation is complicated by the behaviour of the s3 console, which has a “make directory” button, whose action is actually to create an empty file with the name of the directory you wanted. This is not part of the s3 API, just a convention, and the AWS CLI does not do this. If we were to do this, we end up making extra requests, and have to distinguish between real common preixes (directories implied by having contents) and empty placeholder files.
That would make sense if mkdir were only called by end users, but other libraries call it internally. Since the intent there is “make sure that files can be written as children of this path”, the co-op does exactly the right thing - and errors and warnings would just get in the way.
Very thought-provoking scenarios.
That raises a fundamental question. Is the objective of the s3fs project is to wrap what AWS S3 API offers or we keep the API interface for s3fs compliant with a general-purpose file system like Linux/POSIX fs.
My understanding is that s3fs is just a wrapper around the aiobotocore (or eventually botocore) library and should offer what botocore S3 API offers and capable of.
I will try to answer with that assumption in mind. Remove “/bucket” from strings where I refer to the boto3
keybelow.“/bucket/inner” should be able to represent both file and directory from the s3fs API design point of view.
But if two keys exist “/bucket/inner” (file) and “/bucket/inner/” (directory) then
And, explicit check for directory key
Deletion scenarios
Creation scenarios
“/bucket/inner” could be anything again (file and/or directory) And, “/bucket/inner/stuff” should be okay to co-exist in all of the scenarios for “/bucket/inner” (as a file and/or directory)
Now expected outcome for validation
The most challenging part is the deletion
boto3 allows storing data for the “/bucket/inner/” key. Check the following output (highlighted directory content size)
aws cli long listing files:
boto3 object creation:
aws cli long listing again:
I would not make an assumption for end-user of the s3fs library. s3fs is just a wrapper library end of the day. And documenting that mkdir is a no-op breaks that rule of the wrapper library responsibility
I’m sorry, but the quoted text says the exact opposite of your interpretation: the special behaviour applies to the console only, and from CLI, SDK, REST, the “/” character is not special.
I also invite you to consider what you think the right behaviour ought to be in cases where:
This point I agree is a problem, and there is an issue for this; but I think it’s OK to say in documentation that
mkdiris a no-op except for creating buckets. You could solve the issue by keeping tack of mkdirs within the instance, not touching the remote storage backend.