aws-parallelcluster: Parallelcluster 2.1.1 with raid 0 config on Cent OS 7 fails in create cluster
Environment:
- AWS ParallelCluster 2.1.1
- OS: Cent OS 7
- Scheduler: SGE
- Master instance type: m5.large
- Compute instance type: m5.xlarge
Bug description and how to reproduce: Deploying a ParallelCluster 2.1.1 with Raid 0 configuration fails with this error.
Beginning cluster creation for cluster: cluster1
Creating stack named: parallelcluster-cluster1
Status: parallelcluster-cluster1 - ROLLBACK_IN_PROGRESS
Cluster creation failed. Failed events:
- AWS::EC2::Instance MasterServer Received FAILURE signal with UniqueId i-0ecca142dxxxxx
I thought the failure could be because I’m using encrypted EBS volumes with custom KMS key but I commented out both encrypted and ebs_kms_key_id settings but still the same failure.
Additional context: Any other context about the problem. E.g.:
- configuration file without any credentials or personal data.
[global]
update_check = true
sanity_check = true
cluster_template = default
[aws]
aws_region_name = us-west-2
[cluster default]
vpc_settings = vpc-0094xxxxx
key_name = cdns-cluster
base_os = centos7
compute_instance_type = m5.2xlarge
master_instance_type = m5.large
#compute_root_volume_size = 20
#master_root_volume_size = 20
initial_queue_size = 0
tags = {"BU" : "IT", "Sub_BU" : "IT"}
raid_settings = rs
#extra_json = { "cluster" : { "ganglia_enabled" : "yes" } }
[vpc vpc-0094xxxxx]
vpc_id = vpc-0094xxxxx
master_subnet_id = subnet-06cxxxxxx
use_public_ips = false
ssh_from = 172.16.0.0/12
[raid rs]
shared_dir = raid
raid_type = 0
num_of_raid_volumes = 2
volume_size = 100
encrypted = true
ebs_kms_key_id = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
[aliases]
ssh = ssh {CFN_USER}@{MASTER_IP} {ARGS}
When I created the cluster with --norollback option, I can see that the master has a 20GB disk mounted and exported under /shared and also noticed that the 2 disks for the raid0 configuration are not attached to the master.
Attachments: cfn-init.log cloud-init.log
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 18 (10 by maintainers)
Commits related to this issue
- Fix block device conversion The block device returned by the parallelcluster-ebsnvme-id script must be in format suitable for udev rules This fix https://github.com/aws/aws-parallelcluster/issues/82... — committed to lukeseawalker/aws-parallelcluster-cookbook by lukeseawalker 5 years ago
- Fix block device conversion The block device returned by the parallelcluster-ebsnvme-id script must be in format suitable for udev rules E.g. - without -u flag parallelcluster-ebsnvme-id -b /dev/nvm... — committed to lukeseawalker/aws-parallelcluster-cookbook by lukeseawalker 5 years ago
- Fix block device conversion The block device returned by the parallelcluster-ebsnvme-id script must be in format suitable for udev rules E.g. - without -u flag parallelcluster-ebsnvme-id -b /dev/nvm... — committed to aws/aws-parallelcluster-cookbook by lukeseawalker 5 years ago
yes, same error. Please use m4/c4’s until the next release of ParallelCluster
Hit exactly the same issue when attaching two EBS volumes. I think aws/aws-parallelcluster-cookbook#253 will fix the problem. Just post the problem here for record.
pcluster version: 2.1.1 Full log: cfn-init.log
Major error message:
Configuration file:
No error with only one EBS volume. No error when using
m4.large
instead ofm5.large
as master node as pointed out by https://github.com/aws/aws-parallelcluster/issues/823#issuecomment-452850726.