azure-sdk-for-java: [QUERY] How to alleviate Timeouts in List Blobs operation?

Query/Question How to alleviate Timeouts in List Blobs operation?

The timeout is set for 30s which is the max permissible for Blob Service ( as per the Azure documentation ). The max number of keys for listing ( maxResultsPerPage ) is the default 5000. The buckets being listed are large buckets with 100k+ objects.

I know that adding a retry is another possibility but would prefer if there was another alternative.

The timeout exception is given below

Caused by: reactor.core.Exceptions$ReactiveException: java.util.concurrent.TimeoutException: Did not observe any item or terminal signal within 30000ms in 'flatMap' (and no fallback has been configured)
        at reactor.core.Exceptions.propagate(Exceptions.java:393) ~[observer-3.20.92.jar:na]
        at reactor.core.publisher.BlockingIterable$SubscriberIterator.hasNext(BlockingIterable.java:168) ~[observer-3.20.92.jar:na]
        at reactor.core.publisher.BlockingIterable$SubscriberIterator.next(BlockingIterable.java:198) ~[observer-3.20.92.jar:na]
        at kdc.cloudadapters.adapters.MicrosoftAzureAdapter$AzureListRequest.nextBatch(MicrosoftAzureAdapter.java:566) ~[observer-3.20.92.jar:na]
        ... 9 common frames omitted

Caused by: java.util.concurrent.TimeoutException: Did not observe any item or terminal signal within 30000ms in 'flatMap' (and no fallback has been configured)
        at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.handleTimeout(FluxTimeout.java:289) ~[observer-3.20.92.jar:na]
        at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.doTimeout(FluxTimeout.java:274) ~[observer-3.20.92.jar:na]
        at reactor.core.publisher.FluxTimeout$TimeoutTimeoutSubscriber.onNext(FluxTimeout.java:396) ~[observer-3.20.92.jar:na]
        at reactor.core.publisher.StrictSubscriber.onNext(StrictSubscriber.java:89) ~[observer-3.20.92.jar:na]
        at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onNext(FluxOnErrorResume.java:73) ~[observer-3.20.92.jar:na]
        at reactor.core.publisher.MonoDelay$MonoDelayRunnable.run(MonoDelay.java:117) ~[observer-3.20.92.jar:na]
        at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:68) ~[observer-3.20.92.jar:na]
        at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:28) ~[observer-3.20.92.jar:na]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_252]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[na:1.8.0_252]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[na:1.8.0_252]
       ... 3 common frames omitted

Additional information

A shorter variation of the code used is given below

class AzureList {
        
        BlobContainerClient container;
        Iterator<PagedResponse<BlobItem>> iterator;
        String continuationToken;

        public AzureList(String accountName, String accountKey, String bucketName) {
            StorageSharedKeyCredential credential = new StorageSharedKeyCredential(accountName, accountKey);
            String endpoint = String.format(Locale.ROOT, "https://%s.blob.core.windows.net", accountName);
            BlobServiceClient serviceClient = new BlobServiceClientBuilder().credential(credential)
                                                         .endpoint(endpoint)
                                                         .buildClient();
            container = serviceClient.getBlobContainerClient(bucketName);
            iterator = getIterator(/*prefix*/ "");  // Current use case is just "" as prefix but can be different in the future
            continuationToken = null;
        }
        
        private Iterator<PagedResponse<BlobItem>> getIterator(String prefix) {
            ListBlobsOptions options = new ListBlobsOptions().setPrefix(prefix);
            return container.listBlobs(options, continuationToken, Duration.ofSeconds(30L)).iterableByPage().iterator();
        }
        
        public void iterate() {
            List<BlobItem> blobs;
            do {
                blobs = listBlobs();
                // hand off blob list to different consumer class
            } while (continuationToken != null);
        }

        private List<BlobItem> listBlobs() {
            PagedResponse<BlobItem> pagedResponse = iterator.next();
            List<BlobItem> blobs = pagedResponse.getValue();
            continuationToken = pagedResponse.getContinuationToken();
            return blobs;
        }
    }

Why is this not a Bug or a feature Request? Unsure if it is a Bug or an issue with my local env / my code.

Setup (please complete the following information if applicable):

OS: Ubuntu 18.04
IDE : IntelliJ 19.1.4
SDK: azure-storage-blob v12.7.0

Information Checklist Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

Query Added
Setup information Added

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 65 (33 by maintainers)

Most upvoted comments

I see. Makes sense. Since Feb was a beta release, I expect March will be a GA release

rickle-msft on Feb 19, 2021

@somanshreddy We have released a new GA version of the SDK. Could you please give it a try and see if it addresses your problem? If it does, could you also please close the issue?

rickle-msft on Jan 19, 2021

This should be out by february

rickle-msft on Jan 11, 2021

Appears this was fix by azure-core 1.11.0 and azure-core-http-netty 1.7.0, azure-storage-blob 12.9.0 has dependencies on versions 1.10.0 and 1.6.3 so it won’t have the fix. If you include the newer versions of azure-core and azure-core-http-netty in your project directly they will be used in place of the versions that azure-storage-blob depends on, and this will be safe as the newer versions are backward compatible. Once a newer version of azure-storage-blob depending on the fix versions, or newer, is available you should be able to remove the direct dependencies on azure-core and azure-core-http-netty.

alzimmermsft on Nov 30, 2020

Hi @somanshreddy, I just merge this PR (https://github.com/Azure/azure-sdk-for-java/pull/17699) which should have the HTTP client eagerly read the response body when we know it will deserialized. This should reduce the number of occurrences when a TimeoutException or PrematureCloseException are thrown from the SDK by completing more of the HTTP response consumption within the scope of our retrying logic. These changes should be available from Maven after our next SDK release.

alzimmermsft on Nov 24, 2020

Sure @alzimmermsft. I will try to reproduce this and get back to you. DEBUG level on io.netty would suffice?

Yes, DEBUG should include information about the number of connections active and inactive within the connection pool and contain other information surrounding requests and responses.

alzimmermsft on Nov 18, 2020

@somanshreddy We are hoping to release it as a part of our November release in a couple a week or two.

rickle-msft on Oct 27, 2020

@somanshreddy, I’ve taken a look into the exception being returned and not retried. Write and response timeouts will be retried when they occur due to them happening on sending the request and awaiting the response, read timeouts may be retried when they occur.

Read timeouts don’t have an explicit guarantee on being retried as the consumption of the response body may begin in different location. Generally, we do not begin reading the body until we’ve reached out deserialization logic, this happens outside of the context of our HttpPipeline, therefore being outside of the scope of the RequestRetryPolicy/RetryPolicy that would attempt to reprocess the request.

Given this for the time being it would be best to retain your external try/catch block. Scenarios where a request being sent or the response headers being received is taking longer than expected would be handled by the SDK. The last read getting stuck would need to be caught externally.

I’ll be investigating solutions to this issue so that the SDK would be able to handle all three timeout scenarios safely.

alzimmermsft on Oct 22, 2020

@somanshreddy The http client timeouts are on a per-request basis, so they do not include retries. The api timeouts are per operation, so they do include retries.

rickle-msft on Oct 22, 2020