azure-sdk-for-go: Outbound calls timing out with `context deadline exceeded` when TCP connection is been drop
Bug Report
Opening this for visibility in case other folks are having this issue !
TLDR : https://github.com/golang/go/issues/59690
- What happened?
When using HTTP/2, which is defaulted everywhere now, if the keep-alive connection is been dropped somehow, the sdk clients cannot recover causing some timeout context deadline exceeded until the connection been closed by the host.
- What did you expect or want to happen ?
When using a HTTP/2 server, CosmosDB for example, the SDK transport should PING the server to identify is the connection is not stale.
- How can we reproduce it?
I’m able to reproduce this using CosmosDB sample code such as :
package main
import (
"context"
"crypto/tls"
"log"
"net"
"net/http"
"time"
logger "github.com/Azure/azure-sdk-for-go/sdk/azcore/log"
"github.com/Azure/azure-sdk-for-go/sdk/azcore/policy"
"github.com/Azure/azure-sdk-for-go/sdk/data/azcosmos"
)
const (
cosmosDbEndpoint = "https://julienstestcosmos.documents.azure.com"
cosmosDbKey = "REDACTED"
dbName = "julienstest"
containerName = "Subscriptions"
pk = "c88b61d7-85aa-415f-8eda-4e73e5e811d5"
id = "c88b61d7-85aa-415f-8eda-4e73e5e811d5"
)
func main() {
ctx := context.Background()
opt := &azcosmos.ClientOptions{ClientOptions: policy.ClientOptions{
// we call cosmosDB more frequent and CosmosDB's latency is much less than ARM
// so we use difference retry policy than ARM
Retry: policy.RetryOptions{
// Create/Update/Delete option will get ItemAlreadyExists error in second retry if first one already arrive server.
StatusCodes: []int{
// azcore defaults, see RetryOptions doc
http.StatusRequestTimeout, // 408
http.StatusTooManyRequests, // 429
http.StatusInternalServerError, // 500
http.StatusBadGateway, // 502
http.StatusServiceUnavailable, // 503
http.StatusGatewayTimeout, // 504
// adding 403 to work around random 403 from cosmosdb
// see https://github.com/Azure/azure-sdk-for-go/issues/19785
http.StatusForbidden, // 403
},
MaxRetries: 3,
TryTimeout: 5 * time.Second, // The maximum latency in server side in slow EUAP regions are 2-3 seconds today
RetryDelay: 1 * time.Second,
MaxRetryDelay: 2 * time.Second, // Delay increases exponentially, delay from 1 second, 2 time will be 2 seconds
},
}}
logger.SetListener(func(cls logger.Event, msg string) {
log.Printf("%s: %s\n", cls, msg)
})
creds, err := azcosmos.NewKeyCredential(cosmosDbKey)
if err != nil {
log.Fatalf("failed to create credential: %v", err)
}
client, err := azcosmos.NewClientWithKey(cosmosDbEndpoint, creds, opt)
if err != nil {
log.Fatalf("failed to create NewClientWithKey: %v", err)
}
container, err := client.NewContainer(dbName, containerName)
if err != nil {
log.Fatalf("failed to create NewContainer: %v", err)
}
itemResponse, err := container.ReadItem(ctx, azcosmos.NewPartitionKeyString(id), id, nil)
if err != nil {
log.Fatalf("failed to ReadItem: %v", err)
}
// log.Print(itemResponse.ActivityID)
// Set up an infinite loop to place requests from
for {
_, err := container.ReadItem(ctx, azcosmos.NewPartitionKeyString(id), id, nil)
if err != nil {
log.Printf("Timed out with error: %v", err)
} else {
log.Printf("Obtained response with activity ID: %s", itemResponse.ActivityID)
}
time.Sleep(5 * time.Second)
}
}
Then run go run main.go
Check for the port been used to talk to cosmos netstat -anp | grep main
Drop the connection sudo iptables -I INPUT -p tcp --dport 54420 -j DROP where 54420 is the port returned by netstat`
This is what you should see
2023/08/10 04:48:48 Obtained response with activity ID: ffbe9497-7be3-4595-a17b-4d46667e0ab5
2023/08/10 04:48:50 Obtained response with activity ID: ffbe9497-7be3-4595-a17b-4d46667e0ab5
2023/08/10 04:49:17 Timed out with error: Get "https://julienstestcosmos.documents.azure.com/dbs/julienstest/colls/Subscriptions/docs/c88b61d7-85aa-415f-8eda-4e73e5e811d5": context deadline exceeded
2023/08/10 04:49:45 Timed out with error: Get "https://julienstestcosmos.documents.azure.com/dbs/julienstest/colls/Subscriptions/docs/c88b61d7-85aa-415f-8eda-4e73e5e811d5": context deadline exceeded
2023/08/10 04:50:12 Timed out with error: Get "https://julienstestcosmos.documents.azure.com/dbs/julienstest/colls/Subscriptions/docs/c88b61d7-85aa-415f-8eda-4e73e5e811d5": context deadline exceeded
2023/08/10 04:50:39 Timed out with error: Get "https://julienstestcosmos.documents.azure.com/dbs/julienstest/colls/Subscriptions/docs/c88b61d7-85aa-415f-8eda-4e73e5e811d5": context deadline exceeded
2023/08/10 04:51:06 Timed out with error: Get "https://julienstestcosmos.documents.azure.com/dbs/julienstest/colls/Subscriptions/docs/c88b61d7-85aa-415f-8eda-4e73e5e811d5": context deadline exceeded
The connection should recover at some point (based on the host that will kill the keep-alive connection at some point, might take up to 15minutes)
After setting the HTTP/2 transport properly as following :
http2Transport, err := http2.ConfigureTransports(tr2)
if err == nil {
log.Println("HTTP/2 transport configured")
// ReadIdleTimeout is the timeout after which a health check using ping
// frame will be carried out if no frame is received on the connection.
// Note that a ping response will is considered a received frame, so if
// there is no other traffic on the connection, the health check will
// be performed every ReadIdleTimeout interval.
// If zero, no health check is performed.
http2Transport.ReadIdleTimeout = time.Second * 10
// PingTimeout is the timeout after which the connection will be closed
// if a response to Ping is not received.
// Defaults to 15s.
http2Transport.PingTimeout = time.Second * 2
// No error so we can use the new transport
tr = tr2
} else {
log.Printf("HTTP/2 transport configuration failed: %v", err)
}
Then running it with export GODEBUG=http2debug=2 & go run main.go
```bash
2023/08/10 05:10:39 Retry: =====> Try=1
2023/08/10 05:10:39 Request: ==> OUTGOING REQUEST (Try=1)
GET https://julienstestcosmos.documents.azure.com/dbs/julienstest/colls/Subscriptions/docs/c88b61d7-85aa-415f-8eda-4e73e5e811d5
Authorization: REDACTED
User-Agent: azsdk-go-azcosmos/v0.3.5 (go1.20.7; linux)
X-Ms-Cosmos-Sdk-Supportedcapabilities: REDACTED
X-Ms-Date: REDACTED
X-Ms-Documentdb-Partitionkey: REDACTED
X-Ms-Version: REDACTED
2023/08/10 05:10:39 http2: Transport encoding header ":authority" = "julienstestcosmos.documents.azure.com"
2023/08/10 05:10:39 http2: Transport encoding header ":method" = "GET"
2023/08/10 05:10:39 http2: Transport encoding header ":path" = "/dbs/julienstest/colls/Subscriptions/docs/c88b61d7-85aa-415f-8eda-4e73e5e811d5"
2023/08/10 05:10:39 http2: Transport encoding header ":scheme" = "https"
2023/08/10 05:10:39 http2: Transport encoding header "x-ms-date" = "Thu, 10 Aug 2023 05:10:39 GMT"
2023/08/10 05:10:39 http2: Transport encoding header "x-ms-version" = "2020-11-05"
2023/08/10 05:10:39 http2: Transport encoding header "x-ms-cosmos-sdk-supportedcapabilities" = "1"
2023/08/10 05:10:39 http2: Transport encoding header "user-agent" = "azsdk-go-azcosmos/v0.3.5 (go1.20.7; linux)"
2023/08/10 05:10:39 http2: Transport encoding header "x-ms-documentdb-partitionkey" = "[\"c88b61d7-85aa-415f-8eda-4e73e5e811d5\"]"
2023/08/10 05:10:39 http2: Transport encoding header "authorization" = "type%3Dmaster%26ver%3D1.0%26sig%3D6kKN4RGlxyMfV3Q%2FA6avRQSxTzW9j3ggfGLf5Yzpmz0%3D"
2023/08/10 05:10:39 http2: Transport encoding header "accept-encoding" = "gzip"
2023/08/10 05:10:39 http2: Framer 0xc00019d420: wrote HEADERS flags=END_STREAM|END_HEADERS stream=13 len=100
2023/08/10 05:10:44 RoundTrip failure: context deadline exceeded
2023/08/10 05:10:44 Response: ==> REQUEST/RESPONSE (Try=1/5.002407971s, OpTime=5.002449671s) -- REQUEST ERROR
GET https://julienstestcosmos.documents.azure.com/dbs/julienstest/colls/Subscriptions/docs/c88b61d7-85aa-415f-8eda-4e73e5e811d5
Authorization: REDACTED
User-Agent: azsdk-go-azcosmos/v0.3.5 (go1.20.7; linux)
X-Ms-Cosmos-Sdk-Supportedcapabilities: REDACTED
X-Ms-Date: REDACTED
X-Ms-Documentdb-Partitionkey: REDACTED
X-Ms-Version: REDACTED
--------------------------------------------------------------------------------
ERROR:
Get "https://julienstestcosmos.documents.azure.com/dbs/julienstest/colls/Subscriptions/docs/c88b61d7-85aa-415f-8eda-4e73e5e811d5": context deadline exceeded
github.com/Azure/azure-sdk-for-go/sdk/azcore/runtime.(*logPolicy).Do()
/home/juliens/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/azcore@v1.7.0/runtime/policy_logging.go:140
github.com/Azure/azure-sdk-for-go/sdk/azcore/internal/exported.(*Request).Next()
/home/juliens/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/azcore@v1.7.0/internal/exported/request.go:84
github.com/Azure/azure-sdk-for-go/sdk/data/azcosmos.(*sharedKeyCredPolicy).Do()
/home/juliens/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/data/azcosmos@v0.3.5/shared_key_credential.go:120
github.com/Azure/azure-sdk-for-go/sdk/azcore/internal/exported.(*Request).Next()
/home/juliens/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/azcore@v1.7.0/internal/exported/request.go:84
github.com/Azure/azure-sdk-for-go/sdk/azcore/runtime.(*retryPolicy).Do()
/home/juliens/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/azcore@v1.7.0/runtime/policy_retry.go:134
github.com/Azure/azure-sdk-for-go/sdk/azcore/internal/exported.(*Request).Next()
/home/juliens/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/azcore@v1.7.0/internal/exported/request.go:84
github.com/Azure/azure-sdk-for-go/sdk/data/azcosmos.(*headerPolicies).Do()
/home/juliens/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/data/azcosmos@v0.3.5/cosmos_headers_policy.go:51
github.com/Azure/azure-sdk-for-go/sdk/azcore/internal/exported.(*Request).Next()
/home/juliens/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/azcore@v1.7.0/internal/exported/request.go:84
github.com/Azure/azure-sdk-for-go/sdk/azcore/runtime.telemetryPolicy.Do()
/home/juliens/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/azcore@v1.7.0/runtime/policy_telemetry.go:66
github.com/Azure/azure-sdk-for-go/sdk/azcore/internal/exported.(*Request).Next()
/home/juliens/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/azcore@v1.7.0/internal/exported/request.go:84
github.com/Azure/azure-sdk-for-go/sdk/azcore/runtime.includeResponsePolicy()
/home/juliens/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/azcore@v1.7.0/runtime/policy_include_response.go:19
github.com/Azure/azure-sdk-for-go/sdk/azcore/internal/exported.PolicyFunc.Do()
/home/juliens/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/azcore@v1.7.0/internal/exported/request.go:181
github.com/Azure/azure-sdk-for-go/sdk/azcore/internal/exported.(*Request).Next()
/home/juliens/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/azcore@v1.7.0/internal/exported/request.go:84
github.com/Azure/azure-sdk-for-go/sdk/azcore/internal/exported.Pipeline.Do()
/home/juliens/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/azcore@v1.7.0/internal/exported/pipeline.go:96
github.com/Azure/azure-sdk-for-go/sdk/data/azcosmos.(*Client).executeAndEnsureSuccessResponse()
/home/juliens/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/data/azcosmos@v0.3.5/cosmos_client.go:428
github.com/Azure/azure-sdk-for-go/sdk/data/azcosmos.(*Client).sendGetRequest()
/home/juliens/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/data/azcosmos@v0.3.5/cosmos_client.go:310
github.com/Azure/azure-sdk-for-go/sdk/data/azcosmos.(*ContainerClient).ReadItem()
/home/juliens/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/data/azcosmos@v0.3.5/cosmos_container.go:348
main.main()
/home/juliens/go/src/batman/h2/main.go:116
runtime.main()
/snap/go/current/src/runtime/proc.go:250
runtime.goexit()
/snap/go/current/src/runtime/asm_amd64.s:1598
2023/08/10 05:10:44 Retry: error Get "https://julienstestcosmos.documents.azure.com/dbs/julienstest/colls/Subscriptions/docs/c88b61d7-85aa-415f-8eda-4e73e5e811d5": context deadline exceeded
2023/08/10 05:10:44 Retry: End Try #1, Delay=973.05341ms
2023/08/10 05:10:44 http2: Transport sending health check
2023/08/10 05:10:44 http2: Framer 0xc00019d420: wrote RST_STREAM stream=13 len=4 ErrCode=CANCEL
2023/08/10 05:10:44 http2: Framer 0xc00019d420: wrote PING len=8 ping="+\x1a\x9c\xf3\xfc\xae{W"
2023/08/10 05:10:45 Retry: =====> Try=2
2023/08/10 05:10:45 Request: ==> OUTGOING REQUEST (Try=2)
GET https://julienstestcosmos.documents.azure.com/dbs/julienstest/colls/Subscriptions/docs/c88b61d7-85aa-415f-8eda-4e73e5e811d5
Authorization: REDACTED
User-Agent: azsdk-go-azcosmos/v0.3.5 (go1.20.7; linux)
X-Ms-Cosmos-Sdk-Supportedcapabilities: REDACTED
X-Ms-Date: REDACTED
X-Ms-Documentdb-Partitionkey: REDACTED
X-Ms-Version: REDACTED
2023/08/10 05:10:45 http2: Transport encoding header ":authority" = "julienstestcosmos.documents.azure.com"
2023/08/10 05:10:45 http2: Transport encoding header ":method" = "GET"
2023/08/10 05:10:45 http2: Transport encoding header ":path" = "/dbs/julienstest/colls/Subscriptions/docs/c88b61d7-85aa-415f-8eda-4e73e5e811d5"
2023/08/10 05:10:45 http2: Transport encoding header ":scheme" = "https"
2023/08/10 05:10:45 http2: Transport encoding header "x-ms-date" = "Thu, 10 Aug 2023 05:10:39 GMT"
2023/08/10 05:10:45 http2: Transport encoding header "authorization" = "type%3Dmaster%26ver%3D1.0%26sig%3D6kKN4RGlxyMfV3Q%2FA6avRQSxTzW9j3ggfGLf5Yzpmz0%3D"
2023/08/10 05:10:45 http2: Transport encoding header "x-ms-version" = "2020-11-05"
2023/08/10 05:10:45 http2: Transport encoding header "x-ms-cosmos-sdk-supportedcapabilities" = "1"
2023/08/10 05:10:45 http2: Transport encoding header "user-agent" = "azsdk-go-azcosmos/v0.3.5 (go1.20.7; linux)"
2023/08/10 05:10:45 http2: Transport encoding header "x-ms-documentdb-partitionkey" = "[\"c88b61d7-85aa-415f-8eda-4e73e5e811d5\"]"
2023/08/10 05:10:45 http2: Transport encoding header "accept-encoding" = "gzip"
2023/08/10 05:10:45 http2: Framer 0xc00019d420: wrote HEADERS flags=END_STREAM|END_HEADERS stream=15 len=11
2023/08/10 05:10:46 http2: Transport health check failure: context deadline exceeded
2023/08/10 05:10:46 http2: Framer 0xc00019d420: wrote RST_STREAM stream=15 len=4 ErrCode=CANCEL
2023/08/10 05:10:46 RoundTrip failure: http2: client connection lost
2023/08/10 05:10:46 http2: Transport readFrame error on conn 0xc000004000: (*net.OpError) read tcp 10.0.0.4:35104->40.78.226.1:443: use of closed network connection
2023/08/10 05:10:46 Response: ==> REQUEST/RESPONSE (Try=2/1.02727298s, OpTime=7.003358017s) -- REQUEST ERROR
GET https://julienstestcosmos.documents.azure.com/dbs/julienstest/colls/Subscriptions/docs/c88b61d7-85aa-415f-8eda-4e73e5e811d5
Authorization: REDACTED
User-Agent: azsdk-go-azcosmos/v0.3.5 (go1.20.7; linux)
X-Ms-Cosmos-Sdk-Supportedcapabilities: REDACTED
X-Ms-Date: REDACTED
X-Ms-Documentdb-Partitionkey: REDACTED
X-Ms-Version: REDACTED
--------------------------------------------------------------------------------
ERROR:
Get "https://julienstestcosmos.documents.azure.com/dbs/julienstest/colls/Subscriptions/docs/c88b61d7-85aa-415f-8eda-4e73e5e811d5": http2: client connection lost
github.com/Azure/azure-sdk-for-go/sdk/azcore/runtime.(*logPolicy).Do()
/home/juliens/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/azcore@v1.7.0/runtime/policy_logging.go:140
github.com/Azure/azure-sdk-for-go/sdk/azcore/internal/exported.(*Request).Next()
/home/juliens/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/azcore@v1.7.0/internal/exported/request.go:84
github.com/Azure/azure-sdk-for-go/sdk/data/azcosmos.(*sharedKeyCredPolicy).Do()
/home/juliens/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/data/azcosmos@v0.3.5/shared_key_credential.go:120
github.com/Azure/azure-sdk-for-go/sdk/azcore/internal/exported.(*Request).Next()
/home/juliens/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/azcore@v1.7.0/internal/exported/request.go:84
github.com/Azure/azure-sdk-for-go/sdk/azcore/runtime.(*retryPolicy).Do()
/home/juliens/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/azcore@v1.7.0/runtime/policy_retry.go:134
github.com/Azure/azure-sdk-for-go/sdk/azcore/internal/exported.(*Request).Next()
/home/juliens/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/azcore@v1.7.0/internal/exported/request.go:84
github.com/Azure/azure-sdk-for-go/sdk/data/azcosmos.(*headerPolicies).Do()
/home/juliens/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/data/azcosmos@v0.3.5/cosmos_headers_policy.go:51
github.com/Azure/azure-sdk-for-go/sdk/azcore/internal/exported.(*Request).Next()
/home/juliens/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/azcore@v1.7.0/internal/exported/request.go:84
github.com/Azure/azure-sdk-for-go/sdk/azcore/runtime.telemetryPolicy.Do()
/home/juliens/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/azcore@v1.7.0/runtime/policy_telemetry.go:66
github.com/Azure/azure-sdk-for-go/sdk/azcore/internal/exported.(*Request).Next()
/home/juliens/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/azcore@v1.7.0/internal/exported/request.go:84
github.com/Azure/azure-sdk-for-go/sdk/azcore/runtime.includeResponsePolicy()
/home/juliens/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/azcore@v1.7.0/runtime/policy_include_response.go:19
github.com/Azure/azure-sdk-for-go/sdk/azcore/internal/exported.PolicyFunc.Do()
/home/juliens/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/azcore@v1.7.0/internal/exported/request.go:181
github.com/Azure/azure-sdk-for-go/sdk/azcore/internal/exported.(*Request).Next()
/home/juliens/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/azcore@v1.7.0/internal/exported/request.go:84
github.com/Azure/azure-sdk-for-go/sdk/azcore/internal/exported.Pipeline.Do()
/home/juliens/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/azcore@v1.7.0/internal/exported/pipeline.go:96
github.com/Azure/azure-sdk-for-go/sdk/data/azcosmos.(*Client).executeAndEnsureSuccessResponse()
/home/juliens/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/data/azcosmos@v0.3.5/cosmos_client.go:428
github.com/Azure/azure-sdk-for-go/sdk/data/azcosmos.(*Client).sendGetRequest()
/home/juliens/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/data/azcosmos@v0.3.5/cosmos_client.go:310
github.com/Azure/azure-sdk-for-go/sdk/data/azcosmos.(*ContainerClient).ReadItem()
/home/juliens/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/data/azcosmos@v0.3.5/cosmos_container.go:348
main.main()
/home/juliens/go/src/batman/h2/main.go:116
runtime.main()
/snap/go/current/src/runtime/proc.go:250
runtime.goexit()
/snap/go/current/src/runtime/asm_amd64.s:1598
2023/08/10 05:10:46 Retry: error Get "https://julienstestcosmos.documents.azure.com/dbs/julienstest/colls/Subscriptions/docs/c88b61d7-85aa-415f-8eda-4e73e5e811d5": http2: client connection lost
2023/08/10 05:10:46 Retry: End Try #2, Delay=2s
2023/08/10 05:10:48 Retry: =====> Try=3
2023/08/10 05:10:48 Request: ==> OUTGOING REQUEST (Try=3)
GET https://julienstestcosmos.documents.azure.com/dbs/julienstest/colls/Subscriptions/docs/c88b61d7-85aa-415f-8eda-4e73e5e811d5
Authorization: REDACTED
User-Agent: azsdk-go-azcosmos/v0.3.5 (go1.20.7; linux)
X-Ms-Cosmos-Sdk-Supportedcapabilities: REDACTED
X-Ms-Date: REDACTED
X-Ms-Documentdb-Partitionkey: REDACTED
X-Ms-Version: REDACTED
2023/08/10 05:10:48 http2: Transport failed to get client conn for julienstestcosmos.documents.azure.com:443: http2: no cached connection was available
2023/08/10 05:10:48 http2: Transport failed to get client conn for julienstestcosmos.documents.azure.com:443: http2: no cached connection was available
2023/08/10 05:10:48 http2: Transport creating client conn 0xc000004a80 to 40.78.226.1:443
2023/08/10 05:10:48 http2: Framer 0xc00019d6c0: wrote SETTINGS len=18, settings: ENABLE_PUSH=0, INITIAL_WINDOW_SIZE=4194304, MAX_HEADER_LIST_SIZE=10485760
2023/08/10 05:10:48 http2: Framer 0xc00019d6c0: wrote WINDOW_UPDATE len=4 (conn) incr=1073741824
2023/08/10 05:10:48 http2: Transport encoding header ":authority" = "julienstestcosmos.documents.azure.com"
2023/08/10 05:10:48 http2: Transport encoding header ":method" = "GET"
2023/08/10 05:10:48 http2: Transport encoding header ":path" = "/dbs/julienstest/colls/Subscriptions/docs/c88b61d7-85aa-415f-8eda-4e73e5e811d5"
2023/08/10 05:10:48 http2: Transport encoding header ":scheme" = "https"
2023/08/10 05:10:48 http2: Transport encoding header "x-ms-cosmos-sdk-supportedcapabilities" = "1"
2023/08/10 05:10:48 http2: Transport encoding header "user-agent" = "azsdk-go-azcosmos/v0.3.5 (go1.20.7; linux)"
2023/08/10 05:10:48 http2: Transport encoding header "x-ms-documentdb-partitionkey" = "[\"c88b61d7-85aa-415f-8eda-4e73e5e811d5\"]"
2023/08/10 05:10:48 http2: Transport encoding header "authorization" = "type%3Dmaster%26ver%3D1.0%26sig%3D6kKN4RGlxyMfV3Q%2FA6avRQSxTzW9j3ggfGLf5Yzpmz0%3D"
2023/08/10 05:10:48 http2: Transport encoding header "x-ms-date" = "Thu, 10 Aug 2023 05:10:39 GMT"
2023/08/10 05:10:48 http2: Transport encoding header "x-ms-version" = "2020-11-05"
2023/08/10 05:10:48 http2: Transport encoding header "accept-encoding" = "gzip"
2023/08/10 05:10:48 http2: Framer 0xc00019d6c0: wrote HEADERS flags=END_STREAM|END_HEADERS stream=1 len=328
2023/08/10 05:10:48 http2: Framer 0xc00019d6c0: read SETTINGS len=18, settings: MAX_CONCURRENT_STREAMS=100, INITIAL_WINDOW_SIZE=98304, MAX_HEADER_LIST_SIZE=102400
2023/08/10 05:10:48 http2: Transport received SETTINGS len=18, settings: MAX_CONCURRENT_STREAMS=100, INITIAL_WINDOW_SIZE=98304, MAX_HEADER_LIST_SIZE=102400
2023/08/10 05:10:48 http2: Framer 0xc00019d6c0: wrote SETTINGS flags=ACK len=0
2023/08/10 05:10:48 http2: Framer 0xc00019d6c0: read WINDOW_UPDATE len=4 (conn) incr=65537
2023/08/10 05:10:48 http2: Transport received WINDOW_UPDATE len=4 (conn) incr=65537
2023/08/10 05:10:48 http2: Framer 0xc00019d6c0: read SETTINGS flags=ACK len=0
2023/08/10 05:10:48 http2: Transport received SETTINGS flags=ACK len=0
2023/08/10 05:10:48 http2: Framer 0xc00019d6c0: read HEADERS flags=END_HEADERS stream=1 len=1179
2023/08/10 05:10:48 http2: decoded hpack field header field ":status" = "200"
2023/08/10 05:10:48 http2: decoded hpack field header field "date" = "Thu, 10 Aug 2023 05:10:48 GMT"
2023/08/10 05:10:48 http2: decoded hpack field header field "content-length" = "354"
2023/08/10 05:10:48 http2: decoded hpack field header field "content-type" = "application/json"
2023/08/10 05:10:48 http2: decoded hpack field header field "server" = "Compute"
2023/08/10 05:10:48 http2: decoded hpack field header field "x-ms-gatewayversion" = "2.0.0"
2023/08/10 05:10:48 http2: decoded hpack field header field "x-ms-activity-id" = "b41d7a50-310e-429d-9239-0391396271ee"
2023/08/10 05:10:48 http2: decoded hpack field header field "x-ms-last-state-change-utc" = "Sat, 05 Aug 2023 10:33:26.920 GMT"
2023/08/10 05:10:48 http2: decoded hpack field header field "etag" = "\"0d02a39c-0000-0100-0000-60cc12330000\""
2023/08/10 05:10:48 http2: decoded hpack field header field "x-ms-resource-quota" = "documentSize=51200;documentsSize=52428800;documentsCount=-1;collectionSize=52428800;"
2023/08/10 05:10:48 http2: decoded hpack field header field "x-ms-resource-usage" = "documentSize=1472;documentsSize=1128749;documentsCount=1439661;collectionSize=1508347;"
2023/08/10 05:10:48 http2: decoded hpack field header field "x-ms-schemaversion" = "1.16"
2023/08/10 05:10:48 http2: decoded hpack field header field "lsn" = "31645147"
2023/08/10 05:10:48 http2: decoded hpack field header field "x-ms-request-charge" = "1"
2023/08/10 05:10:48 http2: decoded hpack field header field "x-ms-alt-content-path" = "dbs/julienstest/colls/Subscriptions"
2023/08/10 05:10:48 http2: decoded hpack field header field "x-ms-content-path" = "39cKANngRhg="
2023/08/10 05:10:48 http2: decoded hpack field header field "x-ms-quorum-acked-lsn" = "31645147"
2023/08/10 05:10:48 http2: decoded hpack field header field "x-ms-current-write-quorum" = "3"
2023/08/10 05:10:48 http2: decoded hpack field header field "x-ms-current-replica-set-size" = "4"
2023/08/10 05:10:48 http2: decoded hpack field header field "x-ms-documentdb-partitionkeyrangeid" = "0"
2023/08/10 05:10:48 http2: decoded hpack field header field "x-ms-xp-role" = "1"
2023/08/10 05:10:48 http2: decoded hpack field header field "x-ms-global-committed-lsn" = "31645147"
2023/08/10 05:10:48 http2: decoded hpack field header field "x-ms-number-of-read-regions" = "0"
2023/08/10 05:10:48 http2: decoded hpack field header field "x-ms-item-lsn" = "13"
2023/08/10 05:10:48 http2: decoded hpack field header field "x-ms-transport-request-id" = "108109"
2023/08/10 05:10:48 http2: decoded hpack field header field "x-ms-cosmos-llsn" = "31645147"
2023/08/10 05:10:48 http2: decoded hpack field header field "x-ms-cosmos-quorum-acked-llsn" = "31645147"
2023/08/10 05:10:48 http2: decoded hpack field header field "x-ms-cosmos-item-llsn" = "13"
2023/08/10 05:10:48 http2: decoded hpack field header field "x-ms-session-token" = "0:-1#31645147"
2023/08/10 05:10:48 http2: decoded hpack field header field "x-ms-request-duration-ms" = "0.332"
2023/08/10 05:10:48 http2: decoded hpack field header field "x-ms-serviceversion" = " version=2.14.0.0"
2023/08/10 05:10:48 http2: decoded hpack field header field "x-ms-cosmos-physical-partition-id" = "86290bdc-505e-40b0-895a-ad50cec73589"
2023/08/10 05:10:48 http2: Transport received HEADERS flags=END_HEADERS stream=1 len=1179
2023/08/10 05:10:48 http2: Framer 0xc00019d6c0: read DATA stream=1 len=354 data="{\"_attachments\":\"attachments\\/\",\"_etag\":\"\\\"0d02a39c-0000-0100-0000-60cc12330000\\\"\",\"_rid\":\"39cKANngRhgBAAAAAAAAAA==\",\"_self\":\"dbs\\/39cKAA==\\/colls\\/39cKANngRhg=\\/docs\\/39cKANngRhgBAAAAAAAAAA==\\/\",\"id\":\"c88b61d7-85aa-415f-8eda-4e73e5e811d5\",\"subscription\":{" (98 bytes omitted)
2023/08/10 05:10:48 http2: Transport received DATA stream=1 len=354 data="{\"_attachments\":\"attachments\\/\",\"_etag\":\"\\\"0d02a39c-0000-0100-0000-60cc12330000\\\"\",\"_rid\":\"39cKANngRhgBAAAAAAAAAA==\",\"_self\":\"dbs\\/39cKAA==\\/colls\\/39cKANngRhg=\\/docs\\/39cKANngRhgBAAAAAAAAAA==\\/\",\"id\":\"c88b61d7-85aa-415f-8eda-4e73e5e811d5\",\"subscription\":{" (98 bytes omitted)
2023/08/10 05:10:48 http2: Framer 0xc00019d6c0: read DATA flags=END_STREAM stream=1 len=0 data=""
2023/08/10 05:10:48 http2: Transport received DATA flags=END_STREAM stream=1 len=0 data=""
2023/08/10 05:10:48 Response: ==> REQUEST/RESPONSE (Try=3/19.033188ms, OpTime=9.023936561s) -- RESPONSE RECEIVED
GET https://julienstestcosmos.documents.azure.com/dbs/julienstest/colls/Subscriptions/docs/c88b61d7-85aa-415f-8eda-4e73e5e811d5
Authorization: REDACTED
User-Agent: azsdk-go-azcosmos/v0.3.5 (go1.20.7; linux)
X-Ms-Cosmos-Sdk-Supportedcapabilities: REDACTED
X-Ms-Date: REDACTED
X-Ms-Documentdb-Partitionkey: REDACTED
X-Ms-Version: REDACTED
--------------------------------------------------------------------------------
RESPONSE Status: 200 OK
Content-Length: 354
Content-Type: application/json
Date: Thu, 10 Aug 2023 05:10:48 GMT
Etag: "0d02a39c-0000-0100-0000-60cc12330000"
Lsn: REDACTED
Server: Compute
X-Ms-Activity-Id: REDACTED
X-Ms-Alt-Content-Path: REDACTED
X-Ms-Content-Path: REDACTED
X-Ms-Cosmos-Item-Llsn: REDACTED
X-Ms-Cosmos-Llsn: REDACTED
X-Ms-Cosmos-Physical-Partition-Id: REDACTED
X-Ms-Cosmos-Quorum-Acked-Llsn: REDACTED
X-Ms-Current-Replica-Set-Size: REDACTED
X-Ms-Current-Write-Quorum: REDACTED
X-Ms-Documentdb-Partitionkeyrangeid: REDACTED
X-Ms-Gatewayversion: REDACTED
X-Ms-Global-Committed-Lsn: REDACTED
X-Ms-Item-Lsn: REDACTED
X-Ms-Last-State-Change-Utc: REDACTED
X-Ms-Number-Of-Read-Regions: REDACTED
X-Ms-Quorum-Acked-Lsn: REDACTED
X-Ms-Request-Charge: REDACTED
X-Ms-Request-Duration-Ms: REDACTED
X-Ms-Resource-Quota: REDACTED
X-Ms-Resource-Usage: REDACTED
X-Ms-Schemaversion: REDACTED
X-Ms-Serviceversion: REDACTED
X-Ms-Session-Token: REDACTED
X-Ms-Transport-Request-Id: REDACTED
X-Ms-Xp-Role: REDACTED
2023/08/10 05:10:48 Retry: response 200
2023/08/10 05:10:48 Retry: exit due to non-retriable status code
2023/08/10 05:10:48 Obtained response with activity ID: 0a6c444d-f0a5-4987-b101-d7ae8c8f49ba
I’m able to see the connection been recover by the transport along with the runtime retrier policy 😃
At least for cosmos db client, should these setting be enabled by default ?
I did not test this for ARM RPs, is ARM supporting HTTP/2 ? Or maybe doing a better job to close connection ?
- Anything we should know about your environment.
Tested this on a vanilla Azure VM, but same issue is happening code hosted on AKS.
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 1
- Comments: 24 (16 by maintainers)
Fix will be including in November’s release of
azcore@v1.9.0You’ll be surprised the number of servers I saw in the past not supporting ping frames lol.
But all Azure Endpoints should respect the RFC. This should be defaulted, if there is an issue with it, the teams owning the DP should fix it, not the client/caller.
https://httpwg.org/specs/rfc7540.html#PING
And to be clear, the root cause is a bug in the Go standard library.
https://github.com/golang/go/issues/59690
What we’re evaluating here is a workaround until the above has been fixed.
this is random for us and happening more often on busier region
If we want to make it configurable, we have a few options.
trueis problematic as the zero-value isfalse)Conceptually, it seems like this is something we’d always want to enable, and I have a hard time believing that we have Azure hosts that don’t support HTTP/2 ping.
I suppose a badly implemented http/2 server could not implement ping, but it should really not be in production hosting an azure service then? 😄
http2 spec defines 10 frames : https://httpwg.org/specs/rfc7540.html#rfc.section.11.2
I hope servers that claim supporting http/2 support the core 10 frames. There is nothing stating that they are optional
Why would that be the case? Note: an HTTP/2 ping frame != an ICMP ping.
Thanks @ealsur, you’re quite right, I read this too superficially when triaging it