azure-sdk-for-go: LRO: PATCH/PUT initial response of 200 is considered valid

👋

We’re using v17.4.0 of the HDInsights SDK (version 2013-05-01-preview) and when an error occurs (such as selecting an incorrect size for a cluster) - it appears that errors aren’t being surfaced correctly.

When an error’s returned, the API’s returning a 200 status code which contains an error block; which (IMO) should be being raised as an error. Here’s the full API response that we’re seeing:

HTTP/1.1 200 OK
Cache-Control: no-cache
Pragma: no-cache
Transfer-Encoding: chunked
Content-Type: application/json; charset=utf-8
Content-Encoding: gzip
Expires: -1
ETag: "d4f89bcc-370c-461d-a53d-7a1731960010"
Vary: Accept-Encoding
x-ms-hdi-matched-rule: ClusterResourcesAndSubResources
x-ms-hdi-routed-to: RegionalRp
x-ms-hdi-clusteruri: https://management.azure.com/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/hdinsight-resources/providers/Microsoft.HDInsight/clusters/terraform-hdi?api-version=2015-03-01-preview
Azure-AsyncOperation: https://management.azure.com/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/hdinsight-resources/providers/Microsoft.HDInsight/clusters/terraform-hdi/azureasyncoperations/create?api-version=2015-03-01-preview
x-ms-request-id: 2a7b15d9-4097-4ebe-ab16-36f415261c89
x-ms-hdi-served-by: westeurope
Strict-Transport-Security: max-age=31536000; includeSubDomains
x-ms-ratelimit-remaining-subscription-writes: 1198
x-ms-correlation-request-id: e59a0066-ac46-446e-a8dc-9fb94c1fb716
x-ms-routing-request-id: UKWEST:20180704T090814Z:e59a0066-ac46-446e-a8dc-9fb94c1fb716
X-Content-Type-Options: nosniff
Date: Wed, 04 Jul 2018 09:08:13 GMT
{
  "id": "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/hdinsight-resources/providers/Microsoft.HDInsight/clusters/terraform-hdi",
  "name": "terraform-hdi",
  "type": "Microsoft.HDInsight/clusters",
  "location": "West Europe",
  "etag": "d4f89bcc-370c-461d-a53d-7a1731960010",
  "tags": {},
  "properties": {
    "clusterVersion": "3.6.1000.65",
    "osType": "Linux",
    "clusterDefinition": {
      "blueprint": "https://blueprints.azurehdinsight.net/hadoop-3.6.1000.65.1806251927.json",
      "kind": "Hadoop",
      "componentVersion": { "Hadoop": "2.7" }
    },
    "computeProfile": {
      "roles": [
        {
          "name": "headnode",
          "targetInstanceCount": 2,
          "hardwareProfile": { "vmSize": "Medium" },
          "osProfile": { "linuxOperatingSystemProfile": { "username": "tombuildsstuff" } },
          "encryptDataDisks": false
        },
        {
          "name": "workernode",
          "targetInstanceCount": 4,
          "hardwareProfile": { "vmSize": "Medium" },
          "osProfile": { "linuxOperatingSystemProfile": { "username": "tombuildsstuff" } },
          "encryptDataDisks": false
        },
        {
          "name": "zookeepernode",
          "targetInstanceCount": 3,
          "hardwareProfile": { "vmSize": "Medium" },
          "osProfile": { "linuxOperatingSystemProfile": { "username": "tombuildsstuff" } },
          "encryptDataDisks": false
        }
      ]
    },
    "provisioningState": "Failed",
    "clusterState": "Error",
    "createdDate": "2018-07-04T09:00:00.567",
    "quotaInfo": { "coresUsed": 12 },
    "errors": [
      {
        "code": "InvalidDocumentErrorCode",
        "message": "DeploymentDocument 'CsmDocument_2_0' failed the validation. Error: 'VM size 'Medium' provided in the CSM document is invalid or not supported for role 'headnode''"
      }
    ],
    "tier": "standard"
  }
}

Would it be possible to expose this information in the error object using the err object returned from the Future? We’re seeing this error from the CreateOrUpdate method and the Future method, fwiw.

Thanks!

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 16 (16 by maintainers)

Most upvoted comments

OK, the initial PUT to create the cluster returns an HTTP 200 with a provisioning state of InProgress. It’s during polling that we get a response with a provisioning state of Failed; this does return an error object from future.WaitForCompletionRef() however the Code and Message fields are nil since the error object isn’t OData v4. At present the best we can do is to include the entire response body in the error object (stuffing it into the AdditionalInfo field) but this is not ideal (you have to read through a wall of JSON to understand the error). I’m following up with the RP to see if we can do better.

@tombuildsstuff sorry I didn’t read your initial report close enough. 😦 The non-conformant errors field is what’s tripping up the SDK. I’ll work on a fix so that a Failed provisioning state returns an error however the RP needs to fix their service to be compliant.