google-cloud-dotnet: Exception while batch processing document with Google Cloud DocumentAI V1 - StatusCode="DeadlineExceeded"
I am trying to create PoC for Google Cloud DocumentAI V1 https://cloud.google.com/dotnet/docs/reference/Google.Cloud.DocumentAI.V1/latest
I am using DocAI to convert .pdf files into text using DocAI BatchProcessing. I have created console application with below code, which is working fine with single document. But when I try to process multiple pdf documents it’s throwing exception.
Application Code:
public static class DocAIBatchProcess
{
const string projectId = "PROJECTID";
const string processorId = "PROCESSID";
const string location = "us";
const string gcsInputBucketName = "BUCKETNAME";
const string gcsOutputBucketName = "gs://BUCKETNAME/OUTPUTFOLDER/";
const string gcsOutputUriPrefix = "PREFIX";
const string prefix = "INPUTFOLDER/";
const string delimiter = "/";
public static async Task<bool> BatchProcessDocumentAsync(this IEnumerable<GCPStorage.Object> storageObjects)
{
Console.WriteLine($"\n");
Console.WriteLine($"Processing Start : {DateTime.UtcNow}");
try
{
Console.WriteLine("\n");
Console.WriteLine("Processing documents started...");
Console.WriteLine("-------------------------------");
// Create client
DocumentProcessorServiceClient documentProcessorServiceClient = await DocumentProcessorServiceClient.CreateAsync();
GcsDocument gcsDocument = null;
GcsDocuments gcsDocuments = new GcsDocuments();
var storage = StorageClient.Create();
foreach (var storageObject in storageObjects)
{
if (storageObject.Name != prefix)
{
gcsDocument = new GcsDocument()
{
GcsUri = $"gs://cdx-dev/{storageObject.Name}",
MimeType = "application/pdf"
};
gcsDocuments.Documents.Add(gcsDocument);
}
}
BatchDocumentsInputConfig inputConfig = new BatchDocumentsInputConfig();
inputConfig.GcsDocuments = gcsDocuments;
//Output Config
GcsOutputConfig gcsOutputConfig = new GcsOutputConfig();
gcsOutputConfig.GcsUri = gcsOutputBucketName;
DocumentOutputConfig documentOutputConfig = new DocumentOutputConfig();
documentOutputConfig.GcsOutputConfig = gcsOutputConfig;
// Initialize request argument(s)
BatchProcessRequest request = new BatchProcessRequest
{
ProcessorName = ProcessorName.FromProjectLocationProcessor(projectId, location, processorId),
SkipHumanReview = false,
InputDocuments = inputConfig,
DocumentOutputConfig = documentOutputConfig,
};
// Make the request
Operation<BatchProcessResponse, BatchProcessMetadata> response = await documentProcessorServiceClient.BatchProcessDocumentsAsync(request);
// Poll until the returned long-running operation is complete
Operation<BatchProcessResponse, BatchProcessMetadata> completedResponse = await response.PollUntilCompletedAsync().ConfigureAwait(continueOnCapturedContext: false);
// Retrieve the operation result
BatchProcessResponse result = completedResponse.Result;
// Or get the name of the operation
string operationName = response.Name;
// Check if the retrieved long-running operation has completed
if (completedResponse.IsCompleted)
{
// If it has completed, then access the result
BatchProcessResponse retrievedResult = completedResponse.Result;
}
Console.WriteLine($"Processing End : {DateTime.UtcNow}");
Console.WriteLine("Processing documents completed...");
return completedResponse.IsCompleted;
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
Console.WriteLine($"Processing End : {DateTime.UtcNow}");
}
return false;
}
}
DeadlineExceeded : “Deadline expired before operation could complete.”
I tried looking into documentation but couldn’t find anything. Is this issue with DocumentAI while processing in Batch ?
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 16
Well it depends - sometimes things aren’t supported but might work in particularly favorable conditions (which may be hard to trace). For example, it could be that the proxy drops connections quicker than gRPC expects - so it might work fine for quick requests but fail for slower ones. That’s just an example though.