s3fs: Unable to create a directory

What happened: Trying to create a directory in a bucket and it silently fails.

What you expected to happen: Expect a key created in a bucket.

Minimal Complete Verifiable Example:

from s3fs import S3FileSystem
s3 = S3FileSystem(anon=False)
s3.makedirs('s3://my-bucket-name/testdir', exist_ok=False) #fails to create 
s3.makedirs('my-bucket-name/testdir', exist_ok=False) #fails to create

How do I create a directory in a bucket? Am I doing something wrong or is it a bug? I see conversation in #245 and I was wondering if you could please help explain what the intended behavior is.

Anything else we need to know?: s3fs version == 0.5.0 Relevant issue here: #245

Environment:

  • Dask version:
  • Python version: 3.7.7
  • Operating System: Ubuntu Linux
  • Install method (conda, pip, source): conda

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Comments: 19 (9 by maintainers)

Most upvoted comments

This is a long-standing misunderstanding, and your help, @srib , would be greatly appreciated in clarifying the situation for other users.

s3 does not support directories, except the top-level buckets, which are fairly similar in their behaviour. If you expect s3.makedirs('my-bucket-name/testdir') to enable you to write files with paths that start with that string then: congratulations! There is in fact nothing for s3fs to do, the location was already writable.

Once you have a file like “my-bucket-name/testdir/testfile”, the directory will appear to exist, without that file it will not. This is similar to trying to commit an empty directory to git.

Unfortunately, the situation is complicated by the behaviour of the s3 console, which has a “make directory” button, whose action is actually to create an empty file with the name of the directory you wanted. This is not part of the s3 API, just a convention, and the AWS CLI does not do this. If we were to do this, we end up making extra requests, and have to distinguish between real common preixes (directories implied by having contents) and empty placeholder files.

That would make sense if mkdir were only called by end users, but other libraries call it internally. Since the intent there is “make sure that files can be written as children of this path”, the co-op does exactly the right thing - and errors and warnings would just get in the way.

Very thought-provoking scenarios.

That raises a fundamental question. Is the objective of the s3fs project is to wrap what AWS S3 API offers or we keep the API interface for s3fs compliant with a general-purpose file system like Linux/POSIX fs.

My understanding is that s3fs is just a wrapper around the aiobotocore (or eventually botocore) library and should offer what botocore S3 API offers and capable of.

I will try to answer with that assumption in mind. Remove “/bucket” from strings where I refer to the boto3 key below.

keys named “/bucket/inner” and “/bucket/inner/” both exist

“/bucket/inner” should be able to represent both file and directory from the s3fs API design point of view.

But if two keys exist “/bucket/inner” (file) and “/bucket/inner/” (directory) then

s3.isdir("/bucket/inner")  # and 
s3.isfile("/bucket/inner") # both returns True

And, explicit check for directory key

s3.isdir("/bucket/inner/")  # returns True 
s3.isfile("/bucket/inner/") # returns False

Deletion scenarios

s3.rm("/bucket/inner")  # deletes file key

s3.rmdir("/bucket/inner")   # and,
s3.rmdir("/bucket/inner/")  # deletes the directory

Creation scenarios

s3.touch("/bucket/inner")   # creates a file

s3.mkdir("/bucket/inner")   # and
s3.mkdir("/bucket/inner/")  # creates a directory with key: "/bucket/inner/"

keys named “/bucket/inner” and “/bucket/inner/stuff” both exist

“/bucket/inner” could be anything again (file and/or directory) And, “/bucket/inner/stuff” should be okay to co-exist in all of the scenarios for “/bucket/inner” (as a file and/or directory)

s3.touch("/bucker/inner")  # creates a file
s3.mkdir("/bucket/inner")  # creates a directory key as well (key: "/bucker/inner/")
s3.touch("/buckker/inner/stuff")  # creates a file (can try to create the directory ("stuff") as well

Now expected outcome for validation

s3.isfile("/bucket/inner")  # returns True
s3.isdir("/bucket/inner")   # return True
s3.isfile("/bucket/inner/") # returns False
s3.isdir("/bucket/inner/")  # returns True
s3.isfile("/bucker/inner/stuff")  # returns True

The most challenging part is the deletion

s3.rm("/bucker/inner")     # should delete the file key
s3.rmdir("/bucket/inner")  # should raise an exception that directory is not empty
s3.rmdir("/bucker/inner", recursive=True)  # should delete both keys "/bucker/inner/" and "/bucker/inner/stuff"

a key exists at “/bucket/inner/” which contains data and/or metadata

boto3 allows storing data for the “/bucket/inner/” key. Check the following output (highlighted directory content size)

aws cli long listing files:

% aws s3 ls s3://bucket/
                           PRE inner/
2021-01-07 19:00:01          0 inner
% aws s3 ls s3://bucket/inner/
2021-01-07 18:58:05          0 **(zero byte by default for directory content)**
2021-01-07 18:57:38          0 stuff
% aws s3 ls s3://bucket/inner/

boto3 object creation:

boto3_s3_client.put_object(Bucket='bucket', Key='inner1/', Body="sdf")
boto3_s3_client.put_object(Bucket='bucket', Key='inner1/stuff', Body="sdf")

aws cli long listing again:

% aws s3 ls s3://bucket/
                           PRE inner1/
% aws s3 ls s3://bucket/inner1/
2021-01-07 19:28:11          3 **(non-zero bytes for directory content)**
2021-01-07 19:28:47          3 stuff

You could solve the issue by keeping track of mkdirs within the instance, not touching the remote storage backend.

I would not make an assumption for end-user of the s3fs library. s3fs is just a wrapper library end of the day. And documenting that mkdir is a no-op breaks that rule of the wrapper library responsibility

The Amazon S3 console treats all objects that have a forward slash “/” character as the last (trailing) character in the key name as a folder, for example examplekeyname/. You can’t upload an object that has a key name with a trailing “/” character using the Amazon S3 console. However, you can upload objects that are named with a trailing “/” with the Amazon S3 API by using the AWS CLI, AWS SDKs, or REST API.

So it’s not only a feature for AWS console but also a S3 core feature accessible via S3 API (CLI, SDK and REST API)

I’m sorry, but the quoted text says the exact opposite of your interpretation: the special behaviour applies to the console only, and from CLI, SDK, REST, the “/” character is not special.

I also invite you to consider what you think the right behaviour ought to be in cases where:

  • keys named “/bucket/inner” and “/bucket/inner/” both exist
  • keys named “/bucket/inner” and “/bucket/inner/stuff” both exist
  • a key exists at “/bucket/inner/” which contains data and/or metadata

s3. mkdir followed by s3.exists should return True

This point I agree is a problem, and there is an issue for this; but I think it’s OK to say in documentation that mkdir is a no-op except for creating buckets. You could solve the issue by keeping tack of mkdirs within the instance, not touching the remote storage backend.