google-cloud-dotnet: Exception while batch processing document with Google Cloud DocumentAI V1 - StatusCode="DeadlineExceeded"

I am trying to create PoC for Google Cloud DocumentAI V1 https://cloud.google.com/dotnet/docs/reference/Google.Cloud.DocumentAI.V1/latest

I am using DocAI to convert .pdf files into text using DocAI BatchProcessing. I have created console application with below code, which is working fine with single document. But when I try to process multiple pdf documents it’s throwing exception.

image

Application Code:

public static class DocAIBatchProcess
{
    const string projectId = "PROJECTID"; 
    const string processorId = "PROCESSID";
    const string location = "us";
    const string gcsInputBucketName = "BUCKETNAME";
    const string gcsOutputBucketName = "gs://BUCKETNAME/OUTPUTFOLDER/";
    const string gcsOutputUriPrefix = "PREFIX";
    const string prefix = "INPUTFOLDER/";
    const string delimiter = "/";

public static async Task<bool> BatchProcessDocumentAsync(this IEnumerable<GCPStorage.Object> storageObjects)
        {
            Console.WriteLine($"\n");
            Console.WriteLine($"Processing Start : {DateTime.UtcNow}");
            
            try
            {
                Console.WriteLine("\n");
                Console.WriteLine("Processing documents started...");
                Console.WriteLine("-------------------------------");

                // Create client
                DocumentProcessorServiceClient documentProcessorServiceClient = await DocumentProcessorServiceClient.CreateAsync();

                 GcsDocument gcsDocument = null;
                GcsDocuments gcsDocuments = new GcsDocuments();

                var storage = StorageClient.Create();
                foreach (var storageObject in storageObjects)
                {
                    if (storageObject.Name != prefix)
                    {
                        gcsDocument = new GcsDocument()
                        {
                            GcsUri = $"gs://cdx-dev/{storageObject.Name}",
                            MimeType = "application/pdf"
                        };
                        gcsDocuments.Documents.Add(gcsDocument);
                    }
                }

                BatchDocumentsInputConfig inputConfig = new BatchDocumentsInputConfig();
                inputConfig.GcsDocuments = gcsDocuments;

                //Output Config
                GcsOutputConfig gcsOutputConfig = new GcsOutputConfig();
                gcsOutputConfig.GcsUri = gcsOutputBucketName;

                DocumentOutputConfig documentOutputConfig = new DocumentOutputConfig();
                documentOutputConfig.GcsOutputConfig = gcsOutputConfig;

                // Initialize request argument(s)
                BatchProcessRequest request = new BatchProcessRequest
                {
                    ProcessorName = ProcessorName.FromProjectLocationProcessor(projectId, location, processorId),
                    SkipHumanReview = false,
                    InputDocuments = inputConfig,
                    DocumentOutputConfig = documentOutputConfig,
                };

                // Make the request
                Operation<BatchProcessResponse, BatchProcessMetadata> response = await documentProcessorServiceClient.BatchProcessDocumentsAsync(request);

                // Poll until the returned long-running operation is complete
                Operation<BatchProcessResponse, BatchProcessMetadata> completedResponse = await response.PollUntilCompletedAsync().ConfigureAwait(continueOnCapturedContext: false);

                // Retrieve the operation result
                BatchProcessResponse result = completedResponse.Result;

                // Or get the name of the operation
                string operationName = response.Name;
                
                // Check if the retrieved long-running operation has completed
                if (completedResponse.IsCompleted)
                {
                    // If it has completed, then access the result
                    BatchProcessResponse retrievedResult = completedResponse.Result;
                }
                Console.WriteLine($"Processing End : {DateTime.UtcNow}");
                Console.WriteLine("Processing documents completed...");
                return completedResponse.IsCompleted;
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
                Console.WriteLine($"Processing End : {DateTime.UtcNow}");
            }
            return false;
        }
}

DeadlineExceeded : “Deadline expired before operation could complete.”

I tried looking into documentation but couldn’t find anything. Is this issue with DocumentAI while processing in Batch ?

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 16

Most upvoted comments

Well it depends - sometimes things aren’t supported but might work in particularly favorable conditions (which may be hard to trace). For example, it could be that the proxy drops connections quicker than gRPC expects - so it might work fine for quick requests but fail for slower ones. That’s just an example though.