terraform-provider-aws: aws_elasticsearch_domain fails on initial apply due to aws_cloudwatch_log_resource_policy
Community Note
- Please vote on this issue by adding a ๐ reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave โ+1โ or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Terraform CLI and Terraform AWS Provider Version
Terraform v0.12.24
+ provider.aws v3.0.0
+ provider.external v1.2.0
+ provider.vault v2.12.2
Affected Resource(s)
- aws_elasticsearch_domain
- aws_cloudwatch_log_resource_policy
Terraform Configuration Files
data "aws_caller_identity" "current" {}
resource "aws_elasticsearch_domain" "es" {
domain_name = var.domain_name
elasticsearch_version = var.elasticsearch_version
advanced_options = var.advanced_options
ebs_options {
ebs_enabled = var.ebs_volume_size > 0 ? true : false
volume_size = var.ebs_volume_size
volume_type = var.ebs_volume_type
iops = var.ebs_volume_type == "IOPS" ? var.ebs_iops : null
}
encrypt_at_rest {
enabled = var.encrypt_at_rest_enabled
kms_key_id = var.encrypt_at_rest_kms_key_id == "" ? module.kms.arn : var.encrypt_at_rest_kms_key_id
}
cluster_config {
instance_count = var.instance_count
instance_type = var.instance_type
dedicated_master_enabled = var.dedicated_master_enabled
dedicated_master_count = var.dedicated_master_enabled ? var.dedicated_master_count : null
dedicated_master_type = var.dedicated_master_enabled ? var.dedicated_master_type : null
zone_awareness_enabled = var.zone_awareness_enabled
zone_awareness_config {
availability_zone_count = var.zone_awareness_enabled ? var.availability_zone_count : null
}
}
node_to_node_encryption {
enabled = var.node_to_node_encryption_enabled
}
vpc_options {
security_group_ids = concat(var.security_group_ids, [aws_security_group.elasticsearch_sg.id])
subnet_ids = length(var.subnet_ids) > 1 ? slice(var.subnet_ids, 0, var.availability_zone_count) : var.subnet_ids
}
snapshot_options {
automated_snapshot_start_hour = var.automated_snapshot_start_hour
}
domain_endpoint_options {
enforce_https = var.enforce_https
tls_security_policy = var.tls_security_policy
}
dynamic "cognito_options" {
for_each = var.cognito_options
content {
enabled = cognito_options.value.enabled
user_pool_id = cognito_options.value.user_pool_id
identity_pool_id = cognito_options.value.identity_pool_id
role_arn = cognito_options.value.role_arn
}
}
dynamic "log_publishing_options" {
for_each = { for k, v in var.log_publishing_options : k => v if lookup(v, "enabled") == true }
content {
enabled = log_publishing_options.value.enabled
log_type = log_publishing_options.value.log_type
cloudwatch_log_group_arn = aws_cloudwatch_log_group.es_logs[log_publishing_options.key].arn
}
}
tags = merge(
var.tags,
{
Name = var.domain_name,
service = var.service,
team = var.team,
phi = var.phi
},
)
depends_on = [aws_iam_service_linked_role.es]
}
resource "aws_cloudwatch_log_resource_policy" "aes_cloudwatch_log_resource_policy" {
count = length({ for k, v in var.log_publishing_options : k => v if lookup(v, "enabled") == true }) > 0 ? 1 : 0
policy_name = "${title(replace(var.domain_name, "-", ""))}-CloudwatchResourcePolicy"
policy_document = data.aws_iam_policy_document.cloudwatch.json
}
data "aws_iam_policy_document" "cloudwatch" {
statement {
actions = [
"logs:PutLogEvents",
"logs:PutLogEventsBatch",
"logs:CreateLogStream",
]
effect = "Allow"
principals {
type = "Service"
identifiers = ["es.amazonaws.com"]
}
resources = [
# for k, v in aws_cloudwatch_log_group.es_logs : "${v.arn}:*" This never works
for k, v in var.log_publishing_options : "arn:aws:logs:us-east-1:${data.aws_caller_identity.current.account_id}:log-group:/aws/aes/${var.domain_name}/${k}:*" # this almost never works, but seems to have worked once
# "arn:aws:logs:us-east-1:${data.aws_caller_identity.current.account_id}:log-group:*" This works 100% of the time based on my tests
]
}
}
resource "aws_cloudwatch_log_group" "es_logs" {
for_each = { for k, v in var.log_publishing_options : k => v if lookup(v, "enabled", false) == true }
name = "/aws/aes/${var.domain_name}/${each.key}"
retention_in_days = lookup(each.value, "retention_in_days", 14)
tags = merge(
var.tags,
{
Name = "/aws/aes/${var.domain_name}/${each.key}"
service = var.service,
team = var.team,
phi = var.phi
},
)
}
Debug Output
Expected Behavior
The module should have run to completion and created the resources, including the es domain, on the first apply.
Actual Behavior
On first apply, Terraform exits with error:
Error: Error creating ElasticSearch domain: ValidationException: The Resource Access Policy specified for the CloudWatch Logs log group /aws/aes/example-domain/search does not grant sufficient permissions for Amazon Elasticsearch Service to create a log stream. Please check the Resource Access Policy.
However, if you then run the terraform apply again, it passes without issue.
In addition, it seems to function properly if you modify the Resource Access Policy to point to a more open set of permissions, something like "arn:aws:logs:us-east-1:${data.aws_caller_identity.current.account_id}:log-group:*" allows it to function, but we should be able to lock the policy down more than that.
Steps to Reproduce
terraform apply- Error occurs
terraform apply, runs to completion and creates functioning resources.
This makes it extremely difficult to run in CI.
Important Factoids
References
- #6606 shows a solution, but it requires opening up the policy to all of cloudwatch, instead of just the log groups created for this particular resource.
About this issue
- Original URL
- State: open
- Created 4 years ago
- Reactions: 13
- Comments: 17
Has anyone got the fix here? I am facing the same issue and if I do โarn:aws:logs:*โ it works, so donโt know whatโs happening here
This is still happening and using a arn:aws:logs:* seems to work alright but I canโt seem to find out the reason. Iโve tried different dependencies, local waits - nothing helps.
any updates regarding this issue?