keda: Azure Blob Storage Scaler doesn't list blobs recursively

Proposal

I’m not sure if this is a bug in the current implementation but given the default values, if I upload a blob to foo/bar/blob.txt the scaler will not “see” the file in order to count it. I think (next to 0 Go knowledge) this is because https://github.com/kedacore/keda/blob/c2ad43eb9adbee0517e01afe60683faf13f8cb2a/pkg/scalers/azure/azure_blob.go#L26 calls ListBlobsHierarchySegment whereas ListBlobsFlatSegment tells the Azure API to “flatten” the list of files on the server side prior to returning the list.

If this is a bug then it would be great to get this fixed or docs updated to make this clear. However, if this is intended behaviour it would be great to have this as a new feature whereby a developer can pass a switch to the trigger metadata:

  triggers:
    - type: azure-blob
      metadata:
        blobContainerName: mycontainer
        blobCount: "5"
        blobPrefix: ""
        blobDelimiter: "/"
        ignoreBlobHierarchy: true

Use-Case

when I upload a blob which includes a directory structure, it is still included in the blobCount to trigger the Scale Target.

Given this directory tree:

.
├── baz.txt
├── bin.txt
└── foo
    ├── bar
    │   └── world.txt
    └── hello.txt

The call to GetAzureBlobListLength would return 2 With the proposed feature in place, GetAzureBlobListLength would return 4.

Anything else?

No response

About this issue

Original URL
State: closed
Created 3 years ago
Reactions: 1
Comments: 18 (9 by maintainers)

Commits related to this issue

Support listing blobs recursively in azure-blob scaler Closes #1789 Signed-off-by: Ahmed ElSayed <ahmels@microsoft.com> — committed to kedacore/keda by ahmelsayed 3 years ago
Support listing blobs recursively in azure-blob scaler Closes #1789 Signed-off-by: Ahmed ElSayed <ahmels@microsoft.com> — committed to kedacore/keda by ahmelsayed 3 years ago

Most upvoted comments

I can take this up, @kedacore/keda-contributors.

v-shenoy on Mar 29, 2022

@jasonpaige thanks, but let’s keep this open until the PR is not merged 😃

zroubalik on Aug 18, 2021

We’ve agreed to make the following changes:

Add globPattern: <glob> option that takes place when specified, instead of the rest
- This allows people to choose between using glob or delimiter/prefix, but not container name since that’s required
Add recursive: "true" which ignores the delimiter, when specified

tomkerkhove on Aug 17, 2021

I certainly think it does, but it’s different from @jasonpaige & @joachimgoris their scenario who want to scale based on the amount of blobs and not the containers.

So since this was originally reported for blob count, I’d recommend creating a new feature request and link to this one for context so we track both.

@ahmelsayed are you up for implementing both?

tomkerkhove on Aug 17, 2021

You want to be able to count the “folders” under somefolder, which will be 1 or

/foo/root.txt

/foo/somefolder/run1/result.1.txt
/foo/somefolder/run1/retult.2.txt

/foo/somefolder/run2/result.1.txt
/foo/somefolder/run2/retult.2.txt

Would be 2 (run1 and run2), right?

type: azure-blob
metadata:
  blobContainerName: foo
  blobPrefix: "somefolder"
  blobDelimiter: "/"
  count: "blobs" # "blobs" for the current behavior, or "prefixes" to count "folders" under /{blobContainerName}/{blobPrefix}.....{blobDelimiter}

@ahmelsayed Instead of counting folders, I think we should support scaling based on blob count and container count which go recusively through all sub-containers.

If I configure somefolder, I want to be able to scale to 4 since there are 4 text files. In container mode, that would be 2 since I have run1 & run2.

tomkerkhove on Aug 17, 2021

This would also be useful for us. We process batch requests and based on the amount of blobs scale our functions. We separate our blobs in folders so batches don’t get mixed. Making it possible for the blob scaler to find blobs recursively would simplify our process.

joachimgoris on Aug 16, 2021