google-cloud-dotnet: Inserting lots of data to Google BigQuery randomly throws an SSL Exception

šŸ‘‹šŸ» G’Day!

Problem

An SSL error is randomly thrown while importing a lot of data into Google BigQuery.

Details

I’ve got a (once off) maintenance task which is trying to insert about 10mil rows into a Google BigQuery table. Code is nothing too crazy - because this is a once off import.

Randomly (well, it feels random) an SSL error thrown which crashes my app.

Authentication failed because the remote party sent a TLS alert: ā€˜DecryptError’.

I’ve added in some retrying ability (using Polly for .NET) and we’re back on track.

I’m not really sure how to reproduce it, but I’ve provided some info here to maybe help. It’s happened after 30/40k have been pushed up. Other times, hundreds of thousands pushed up. With Polly, it retries and it works again/continues … until the next random failure … which Polly retries OK and we rinse/repeat.

image

Environment details

PS C:\Users\justi> dotnet --info
.NET SDK:
 Version:   7.0.304
 Commit:    7e794e2806

Runtime Environment:
 OS Name:     Windows
 OS Version:  10.0.22621
 OS Platform: Windows
 RID:         win10-x64
 Base Path:   C:\Program Files\dotnet\sdk\7.0.304\

Host:
  Version:      7.0.7
  Architecture: x64
  Commit:       5b20af47d9
  • Package name and version: Google.Cloud.BigQuery.V2 v3.2.0

Here’s some sample code I’ve got which is doing this:

// Configure the retry policy with Polly
var retryPolicy = Policy.Handle<Exception>()
    .WaitAndRetryAsync(3, retryAttempt => TimeSpan.FromSeconds(Math.Pow(2, retryAttempt)), (exception, timeSpan, retryCount, context) =>
    {
        // Log the exception to the console
        Console.WriteLine($"Retry #{retryCount} after {timeSpan.TotalSeconds} seconds due to: {exception}");
    });

try
{
    // Execute the retry policy for the HTTP request
    var pollyResult = await retryPolicy.ExecuteAsync(async () =>
    {
        var result = await client.InsertRowsAsync(destinationTable, rows, null, cancellationToken);

        return result;
    });


    _logger.LogDebug("Finished sending all rows to Google BigQuery. Status: {bigQueryInsertStatus}", pollyResult.Status);

    pollyResult.ThrowOnAnyError();

}
catch (Exception exception)
{
    // Log the final exception to the console
    Console.WriteLine($"Error occurred after retries: {exception}");

    throw;
}

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 24

Most upvoted comments

Each client has its own HttpClient, which would create a separate set of network connections. I suspect you’re running out of TCP ports or similar. I strongly suspect this will clear up the problem. I might try making a change to the repro to create a new client for each batch and see if that fixes the problem - then I could document it accordingly.

I’ve tried to reproduce this in https://github.com/jskeet/google-cloud-dotnet/tree/issue-10625 - I’ve not seen the problem yet.

My most recently attempt consisted of:

  • 10 million rows
  • Inserting 1000 rows per request
  • Each row consists of an integer ID, and a text field with 20 random ASCII characters

(The test took about 25 minutes from home.)

If you’d be happy to run the repro code in your environment, the results would be interesting.

You’re using Encoding.Unicode.GetByteCount() there, which will mean two bytes per character… I suspect they’re mostly ASCII characters, so that’ll be 1 byte per character in UTF-8. That suggests it’s more like 50-100 bytes per row in terms of data - with some overhead for the number of fields. I’ll edit my test accordingly, but I’d be surprised if that made a difference. Worth testing though.

Hi @PureKrome,

We’ve discussed this in the team, and we’d like to at least try to reproduce the problem for the sake of completeness, but we suspect we won’t want to actually make any changes to how retries are performed… while it may be safe to automatically retry in your particular case, at the point where the network transport has problems we’re in unusual territory, and in many cases retrying would be the wrong approach.

Could you let us know:

  • What your insert code looks like (just which methods you’re using, and ideally a rough idea of how how large each row is, in terms of fields and total data)
  • What your network is like - are you going through a proxy which might be causing problems, for example? (That seems to be the most likely cause of problems, to be honest.)
  • How often you see the errors in terms of elapsed time (due to some internal details, I wouldn’t be entirely surprised to see issues once every hour, for example)