metering-operator: AWS Billing partitionSpec mismatch
I’m trying to get my AWS billing report loaded in, and am running into errors like this:
validation failure list: \ nspec.partitions.partitionSpec in body must be of type array: "object" " app=metering component=reportDataSourceWorker error=" HiveTable.metering.openshift.io "reportdatasource-metering-aws-billing" is invalid: []: Invalid value: map[string]
followed by a dump of the HiveTable that’s trying to be applied. I’ve copied the relevant part below, with the table name redacted:
"databaseName": "metering",
"external": true,
"location": "s3a://my-redacted-billing-reports-bucket/cur-report/",
"managePartitions": true,
"partitionedBy": []interface {} {
map[string]interface {} {
"name": "billing_period_start",
"type": "string"
},
map[string]interface {} {
"name": "billing_period_end",
"type": "string"
}
},
"partitions": []interface {} {
map[string]interface {} {
"location": "s3a://my-redacted-billing-reports-bucket/cur-report/cur-report/20191001-20191101/d0abd31d-5d9b-4af1-80e9-8596bcc7b6e3/",
"partitionSpec": map[string]interface {} {
"end": "20191101",
"start": "20191001"
}
}
},
"rowFormat": "\\nSERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'\\nWITH SERDEPROPERTIES (\\n \\" serialization.format \\ " = \\",
Looking through the Go source code, it seemed like partitionSpec really should be an object, not a string array. So I modified hive.crd.yaml from this…
partitions:
type: array
description: |
A list of partitions that this Hive table should contain.
Note: this is an optional field.
items:
type: object
required:
- partitionSpec
- location
properties:
partitionSpec:
type: array
description: |
PartitionSpec is a map containing string keys and values, where each key
is expected to be the name of a partition column, and the value is the
value of the partition column.
location:
type: string
description: |
Location specifies where the data for this partition is stored.
This should be a sub-directory of the "location" field.
minLength: 1
format: uri
to this…
partitions:
type: array
description: |
A list of partitions that this Hive table should contain.
Note: this is an optional field.
items:
type: object
required:
- partitionSpec
- location
properties:
partitionSpec:
type: object
description: |
PartitionSpec is a map containing string keys and values, where each key
is expected to be the name of a partition column, and the value is the
value of the partition column.
location:
type: string
description: |
Location specifies where the data for this partition is stored.
This should be a sub-directory of the "location" field.
minLength: 1
format: uri
That allowed the HiveTable update to actually go through, but then I started getting errors like this:
time="2019-10-23T14:04:24Z" level=error msg="error syncing HiveTable reportdatasource-metering-aws-billing" app=metering component=hiveTableWorker error="failed to add partition `billing_period_start`=``, `billing_period_end`=`` location s3a://my-redacted-billing-reports-bucket/cur-report/cur-report/20191001-20191101/d0abd31d-5d9b-4af1-80e9-8596bcc7b6e3/ to Hive table \"datasource_metering_aws_billing\": hive: query failed. errmsg=Error while compiling statement: FAILED: ParseException line 1:96 cannot recognize input near '' ',' 'billing_period_end' in constant" hiveTable=reportdatasource-metering-aws-billing logID=hsAWCI9NrM namespace=metering
I couldn’t reconcile this behavior with the Go code. aws_usage_hive.go seems to hard-code the names as “start” and “end”:
p := metering.HiveTablePartition{
Location: location,
PartitionSpec: hive.PartitionSpec{
"start": start,
"end": end,
},
}
But db.go seems to look for keys matching the column names, which I guess are billing_period_start and billing_period_end in this case:
func FmtPartitionSpec(partitionColumns []hive.Column, partSpec hive.PartitionSpec) string {
var partitionVals []string
for _, col := range partitionColumns {
val := partSpec[col.Name]
// Quote strings
if strings.ToLower(col.Type) == "string" {
val = "`" + val + "`"
}
partitionVals = append(partitionVals, fmt.Sprintf("`%s`=%s", col.Name, val))
}
What’s going on here? It seems like there’s a mismatch between the hivetable spec used by aws_usage_hive.go, the hive crd, and db.go
About this issue
- Original URL
- State: open
- Created 5 years ago
- Comments: 21 (14 by maintainers)
Yep these are the same changes I was looking at minus the quote changes, but those seem necessary too as we used the wrong quotes. That last one is probably caused by https://github.com/operator-framework/operator-metering/issues/993, which looks like is the other one you filed. I may try to combine these PRs and bugs to make it easier to fix.