metering-operator: AWS Billing partitionSpec mismatch

I’m trying to get my AWS billing report loaded in, and am running into errors like this:

validation failure list: \ nspec.partitions.partitionSpec in body must be of type array: "object" " app=metering component=reportDataSourceWorker error=" HiveTable.metering.openshift.io "reportdatasource-metering-aws-billing" is invalid: []: Invalid value: map[string]

followed by a dump of the HiveTable that’s trying to be applied. I’ve copied the relevant part below, with the table name redacted:

"databaseName": "metering",
		"external": true,
		"location": "s3a://my-redacted-billing-reports-bucket/cur-report/",
		"managePartitions": true,
		"partitionedBy": []interface {} {
			map[string]interface {} {
				"name": "billing_period_start",
				"type": "string"
			},
			map[string]interface {} {
				"name": "billing_period_end",
				"type": "string"
			}
		},
		"partitions": []interface {} {
			map[string]interface {} {
				"location": "s3a://my-redacted-billing-reports-bucket/cur-report/cur-report/20191001-20191101/d0abd31d-5d9b-4af1-80e9-8596bcc7b6e3/",
				"partitionSpec": map[string]interface {} {
					"end": "20191101",
					"start": "20191001"
				}
			}
		},
		"rowFormat": "\\nSERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'\\nWITH SERDEPROPERTIES (\\n    \\" serialization.format \\ " = \\",

Looking through the Go source code, it seemed like partitionSpec really should be an object, not a string array. So I modified hive.crd.yaml from this…

partitions:
              type: array
              description: |
                A list of partitions that this Hive table should contain.
                Note: this is an optional field.
              items:
                type: object
                required:
                - partitionSpec
                - location
                properties:
                  partitionSpec:
                    type: array
                    description: |
                      PartitionSpec is a map containing string keys and values, where each key
                      is expected to be the name of a partition column, and the value is the
                      value of the partition column.
                  location:
                    type: string
                    description: |
                      Location specifies where the data for this partition is stored.
                      This should be a sub-directory of the "location" field.
                    minLength: 1
                    format: uri

to this…

            partitions:
              type: array
              description: |
                A list of partitions that this Hive table should contain.
                Note: this is an optional field.
              items:
                type: object
                required:
                - partitionSpec
                - location
                properties:
                  partitionSpec:
                    type: object
                    description: |
                      PartitionSpec is a map containing string keys and values, where each key
                      is expected to be the name of a partition column, and the value is the
                      value of the partition column.
                  location:
                    type: string
                    description: |
                      Location specifies where the data for this partition is stored.
                      This should be a sub-directory of the "location" field.
                    minLength: 1
                    format: uri

That allowed the HiveTable update to actually go through, but then I started getting errors like this:

time="2019-10-23T14:04:24Z" level=error msg="error syncing HiveTable reportdatasource-metering-aws-billing" app=metering component=hiveTableWorker error="failed to add partition `billing_period_start`=``, `billing_period_end`=`` location s3a://my-redacted-billing-reports-bucket/cur-report/cur-report/20191001-20191101/d0abd31d-5d9b-4af1-80e9-8596bcc7b6e3/ to Hive table \"datasource_metering_aws_billing\": hive: query failed. errmsg=Error while compiling statement: FAILED: ParseException line 1:96 cannot recognize input near '' ',' 'billing_period_end' in constant" hiveTable=reportdatasource-metering-aws-billing logID=hsAWCI9NrM namespace=metering

I couldn’t reconcile this behavior with the Go code. aws_usage_hive.go seems to hard-code the names as “start” and “end”:

		p := metering.HiveTablePartition{
			Location: location,
			PartitionSpec: hive.PartitionSpec{
				"start": start,
				"end":   end,
			},
		}

But db.go seems to look for keys matching the column names, which I guess are billing_period_start and billing_period_end in this case:

func FmtPartitionSpec(partitionColumns []hive.Column, partSpec hive.PartitionSpec) string {
	var partitionVals []string
	for _, col := range partitionColumns {
		val := partSpec[col.Name]
		// Quote strings
		if strings.ToLower(col.Type) == "string" {
			val = "`" + val + "`"
		}
		partitionVals = append(partitionVals, fmt.Sprintf("`%s`=%s", col.Name, val))
	}

What’s going on here? It seems like there’s a mismatch between the hivetable spec used by aws_usage_hive.go, the hive crd, and db.go

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Comments: 21 (14 by maintainers)

Most upvoted comments

Yep these are the same changes I was looking at minus the quote changes, but those seem necessary too as we used the wrong quotes. That last one is probably caused by https://github.com/operator-framework/operator-metering/issues/993, which looks like is the other one you filed. I may try to combine these PRs and bugs to make it easier to fix.