gcsfuse: unbearable slowness of directory listings
500 files in a directory. the instance is in us-east and the storage is there as well (regional).
> time gsutil ls gs:\\mybucketdir | wc -l
500
real 0m0.974s
user 0m0.484s
sys 0m0.156s
When running ls /mnt/mymountpoint
I get the following timings
gcs: Req 0x98: -> ListObjects() (199.70033ms): OK
gcs: Req 0x9a: <- ListObjects()
gcs: Req 0x9a: -> ListObjects() (203.140236ms): OK
gcs: Req 0x9b: <- ListObjects()
gcs: Req 0x9b: -> ListObjects() (197.373822ms): OK
gcs: Req 0x9c: <- ListObjects()
gcs: Req 0x9c: -> ListObjects() (184.619554ms): OK
gcs: Req 0x9d: <- ListObjects()
gcs: Req 0x9d: -> ListObjects() (764.47926ms): OK
gcs: Req 0x9e: <- ListObjects()
gcs: Req 0x9e: -> ListObjects() (150.530204ms): OK
gcs: Req 0x9f: <- ListObjects()
gcs: Req 0x9f: -> ListObjects() (199.542155ms): OK
gcs: Req 0xa0: <- ListObjects()
gcs: Req 0xa0: -> ListObjects() (181.239533ms): OK
gcs: Req 0xa1: <- ListObjects()
gcs: Req 0xa1: -> ListObjects() (215.307041ms): OK
gcs: Req 0xa2: <- ListObjects()
gcs: Req 0xa2: -> ListObjects() (205.366617ms): OK
gcs: Req 0xa3: <- ListObjects()
gcs: Req 0xa3: -> ListObjects() (197.569194ms): OK
gcs: Req 0xa4: <- ListObjects()
gcs: Req 0xa4: -> ListObjects() (197.323209ms): OK
gcs: Req 0xa5: <- ListObjects()
gcs: Req 0xa5: -> ListObjects() (200.654838ms): OK
gcs: Req 0xa6: <- ListObjects()
gcs: Req 0xa6: -> ListObjects() (163.866557ms): OK
gcs: Req 0xa7: <- ListObjects()
gcs: Req 0xa7: -> ListObjects() (217.490382ms): OK
gcs: Req 0xa8: <- ListObjects()
gcs: Req 0xa8: -> ListObjects() (225.55158ms): OK
gcs: Req 0xa9: <- ListObjects()
gcs: Req 0xa9: -> ListObjects() (188.412506ms): OK
gcs: Req 0xaa: <- ListObjects()
gcs: Req 0xaa: -> ListObjects() (201.611923ms): OK
gcs: Req 0xab: <- ListObjects()
gcs: Req 0xab: -> ListObjects() (203.075812ms): OK
it’s mounted with these options: --foreground --limit-ops-per-sec -1 --implicit-dirs -o ro -o allow_other --uid 1001 --gid 1002 --debug_gcs
Is there any way to make it faster? it takes few minutes to run ls compared to gsutil that takes few seconds.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 1
- Comments: 27 (14 by maintainers)
Thanks for sending the log. With the
--debug_fuse
output I can now see what’s going on. For the sake of posterity, I’m posting the leading portion of the log below, with directory and file names redacted.This is for running
ls --color=auto /mntpoint/parent1/parent2/parent3/
, with gcsfuse mounted using--implicit-dirs
.Everything up to the
ReadDir
op looks sane. Strictly listing the directory is not particularly slow (as you confirmed before withls -1
).But then
ls
goes to stat each object, and this results in aLookUpInode
request for each of the three ancestor directories. (I don’t know why this is, but it’s the decision of the kernel’s VFS layer and is presumably for a sane reason.)The type cache successfully avoids a lookup as a file, so we look up the inodes only as directories. Since you’re using
--implicit-dirs
, this involves a call toListObjects
to confirm that the implicit directory still exists. Once all of the directories are looked up, aLookUpInode
is received for the file, and responded to from cache (since we have a backing object record in our cache from the earlier calls toListObjects
).Therefore the 500 stat calls from
ls
turn into 1500 calls toListObjects
. This is in contrast togsutil ls
, which I believe does nothing but a singleListObjects
request.Unfortunately this is the cost of pretending to be a file system with directory inodes, but not actually having GCS objects to back them. The correct fix here is to not use
--implicit-dirs
.I do not understand how to work with read-only mount without “–implicit-dirs” - it does not show me the directories and it gives me " No such file or directory" when I tried to ls them. This is why we used this option in the first place.
Surprisingly, now any option with gcsfuse works fast and the request listing is much shorter! I do not understand what happens. Full listing below with options:
gcsfuse --foreground -o allow_other -o ro --implicit-dirs --uid 1001 --gid 1002 --limit-ops-per-sec -1 --debug_gcs ubieast /ubimo/east2