keda: Azure Blob Storage Scaler doesn't list blobs recursively

Proposal

I’m not sure if this is a bug in the current implementation but given the default values, if I upload a blob to foo/bar/blob.txt the scaler will not “see” the file in order to count it. I think (next to 0 Go knowledge) this is because https://github.com/kedacore/keda/blob/c2ad43eb9adbee0517e01afe60683faf13f8cb2a/pkg/scalers/azure/azure_blob.go#L26 calls ListBlobsHierarchySegment whereas ListBlobsFlatSegment tells the Azure API to “flatten” the list of files on the server side prior to returning the list.

If this is a bug then it would be great to get this fixed or docs updated to make this clear. However, if this is intended behaviour it would be great to have this as a new feature whereby a developer can pass a switch to the trigger metadata:

  triggers:
    - type: azure-blob
      metadata:
        blobContainerName: mycontainer
        blobCount: "5"
        blobPrefix: ""
        blobDelimiter: "/"
        ignoreBlobHierarchy: true

Use-Case

when I upload a blob which includes a directory structure, it is still included in the blobCount to trigger the Scale Target.

Given this directory tree:

.
├── baz.txt
├── bin.txt
└── foo
    ├── bar
    │   └── world.txt
    └── hello.txt

The call to GetAzureBlobListLength would return 2 With the proposed feature in place, GetAzureBlobListLength would return 4.

Anything else?

No response

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 18 (9 by maintainers)

Commits related to this issue

Most upvoted comments

I can take this up, @kedacore/keda-contributors.

@jasonpaige thanks, but let’s keep this open until the PR is not merged 😃

We’ve agreed to make the following changes:

  • Add globPattern: <glob> option that takes place when specified, instead of the rest
    • This allows people to choose between using glob or delimiter/prefix, but not container name since that’s required
  • Add recursive: "true" which ignores the delimiter, when specified

I certainly think it does, but it’s different from @jasonpaige & @joachimgoris their scenario who want to scale based on the amount of blobs and not the containers.

So since this was originally reported for blob count, I’d recommend creating a new feature request and link to this one for context so we track both.

@ahmelsayed are you up for implementing both?

You want to be able to count the “folders” under somefolder, which will be 1 or

/foo/root.txt

/foo/somefolder/run1/result.1.txt
/foo/somefolder/run1/retult.2.txt

/foo/somefolder/run2/result.1.txt
/foo/somefolder/run2/retult.2.txt

Would be 2 (run1 and run2), right?

type: azure-blob
metadata:
  blobContainerName: foo
  blobPrefix: "somefolder"
  blobDelimiter: "/"
  count: "blobs" # "blobs" for the current behavior, or "prefixes" to count "folders" under /{blobContainerName}/{blobPrefix}.....{blobDelimiter}

@ahmelsayed Instead of counting folders, I think we should support scaling based on blob count and container count which go recusively through all sub-containers.

If I configure somefolder, I want to be able to scale to 4 since there are 4 text files. In container mode, that would be 2 since I have run1 & run2.

This would also be useful for us. We process batch requests and based on the amount of blobs scale our functions. We separate our blobs in folders so batches don’t get mixed. Making it possible for the blob scaler to find blobs recursively would simplify our process.