flyte: [BUG] Azure Workload Identity not working with `fsspec` in `flytekit`
Describe the bug
For workload identity to work within AzureBlobFileSystem, the filesystem needs to be instantiated with an account name and anon set to False. Currently, due to how get_filesystem() is set up, we cannot pass the right kwargs for Workload Identity to work.
The workaround is to set AZURE_STORAGE_ACCOUNT_NAME and AZURE_STORAGE_ACCOUNT_KEY, which will allow flytekit to talk to blob storage, but it would be nice to support Workload Identity directly.
This relates to the discussion in https://github.com/flyteorg/flyte/issues/3945
Expected behavior
Enabling Azure Workload Identity should work out of the box when flytekit tries to read/write to blob storage.
Additional context to reproduce
No response
Screenshots
No response
Are you sure this issue hasn’t been raised already?
- Yes
Have you read the Code of Conduct?
- Yes
About this issue
- Original URL
- State: open
- Created a year ago
- Reactions: 1
- Comments: 24 (12 by maintainers)
I created a small PR to extend the flyte-binary helm chart for azure storage support. I tested 1813 and 1842 to confirm the behavior.
Between https://github.com/flyteorg/flytekit/pull/1813 and https://github.com/flyteorg/flytekit/pull/1842 I think its fair to close this as complete.
The Use case would be to have one storage account for flyte metadata and multiple storage accounts for different flyte projects and domains where user data can be uploaded/downloaded via Flytefiles or Flytedirectories. But like you said, i think it is not possible at the moment because of the relative output format.
If you want your python flytekit code to write to the same storage account where stow is writing metadata, why dont just set the env variable
AZURE_STORAGE_ACCOUNT_NAMEwhich is being read by adlfs already? I am using workload identities for azure and with setting this env variable, it works like expected. This way we dont need the special env variables?@wild-endeavor I can make a PR to add that to
get_filesystem()if you’d like. I know you’re planning to refactor, but this is a 2 lines change that could help inform your refactor as well