noobaa-core: Intermittant "We encountered an internal error. Please try again"
Environment info
https://github.ibm.com/IBMSpectrumScale/hpo-core/issues/555 This is from HPO-core defect While running a cosbench test I get occasional errors. Sometimes it passes and sometimes it does not.
Errors occurred somewhere between these times Start of run Wed Mar 2 15:30:03 EST 2022 End of run Wed Mar 2 15:34:19 EST 2022
A section from the cosbench log which shows the failure
[root@c83f1-dan4 ~]# From the Cosbench log ================================================== stage: s1-prepare_GB ================================================== ---------------------------------- mission: MA4C534C68B, driver: app6 ---------------------------------- 2022-03-02 15:30:02,008 [INFO] [Log4jLogManager] - will append log to file /root/cosbench/log/mission/MA4C534C68B.log 2022-03-02 15:30:02,235 [INFO] [NoneStorage] - performing PUT at /s5001b1/s5001o1_GB 2022-03-02 15:30:02,236 [INFO] [NoneStorage] - performing PUT at /s5001b1/s5001o3_GB 2022-03-02 15:30:02,236 [INFO] [NoneStorage] - performing PUT at /s5001b1/s5001o5_GB 2022-03-02 15:30:02,236 [INFO] [NoneStorage] - performing PUT at /s5001b1/s5001o2_GB 2022-03-02 15:30:02,236 [INFO] [NoneStorage] - performing PUT at /s5001b1/s5001o4_GB 2022-03-02 15:30:05,808 [INFO] [NoneStorage] - performing PUT at /s5001b2/s5001o4_GB 2022-03-02 15:30:06,344 [INFO] [NoneStorage] - performing PUT at /s5001b2/s5001o1_GB 2022-03-02 15:30:07,363 [WARN] [S3Storage] - below exception encountered when creating object s5001o2_GB at s5001b1: Failed to reset the request input stream; If the request involves an input stream, the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int) 2022-03-02 15:30:07,365 [INFO] [NoneStorage] - performing PUT at /s5001b2/s5001o2_GB 2022-03-02 15:30:10,220 [INFO] [NoneStorage] - performing PUT at /s5001b3/s5001o1_GB 2022-03-02 15:30:10,269 [INFO] [NoneStorage] - performing PUT at /s5001b2/s5001o3_GB 2022-03-02 15:30:11,647 [INFO] [NoneStorage] - performing PUT at /s5001b2/s5001o5_GB 2022-03-02 15:30:14,219 [INFO] [NoneStorage] - performing PUT at /s5001b3/s5001o3_GB 2022-03-02 15:30:16,229 [INFO] [NoneStorage] - performing PUT at /s5001b3/s5001o4_GB 2022-03-02 15:30:18,461 [INFO] [NoneStorage] - performing PUT at /s5001b3/s5001o2_GB 2022-03-02 15:30:19,597 [INFO] [NoneStorage] - performing PUT at /s5001b4/s5001o4_GB 2022-03-02 15:30:19,948 [WARN] [S3Storage] - below exception encountered when creating object s5001o3_GB at s5001b3: Failed to reset the request input stream; If the request involves an input stream, the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int) 2022-03-02 15:30:19,948 [INFO] [NoneStorage] - performing PUT at /s5001b4/s5001o3_GB 2022-03-02 15:30:21,821 [INFO] [NoneStorage] - performing PUT at /s5001b4/s5001o1_GB 2022-03-02 15:30:23,736 [INFO] [NoneStorage] - performing PUT at /s5001b3/s5001o5_GB 2022-03-02 15:30:31,439 [INFO] [NoneStorage] - performing PUT at /s5001b4/s5001o2_GB 2022-03-02 15:30:39,427 [INFO] [NoneStorage] - performing PUT at /s5001b4/s5001o5_GB
Picking out this error to follow:
2022-03-02 15:30:07,363 [WARN] [S3Storage] - below exception encountered when creating object s5001o2_GB at s5001b1: Failed to reset the request input stream; If the request involves an input stream, the maximum stream buffer size can be configured via
Looking for that error in the noobaa log:
[root@c83f1-infa internal]# oc get pod -o wide | grep -i endpoint | grep dan3 | awk '{print $1}
'| xargs -ti oc logs {} | grep 20:30:07 | grep s5001o2 oc logs noobaa-endpoint-749785777b-2fzxd oc logs noobaa-endpoint-749785777b-2g7s4 oc logs noobaa-endpoint-749785777b-6jp42 oc logs noobaa-endpoint-749785777b-6wgcc oc logs noobaa-endpoint-749785777b-7sx8t oc logs noobaa-endpoint-749785777b-btkhh oc logs noobaa-endpoint-749785777b-gsrx9 oc logs noobaa-endpoint-749785777b-spp8b oc logs noobaa-endpoint-749785777b-wf4p4 oc logs noobaa-endpoint-749785777b-xsqxc Mar-2 20:30:07.379 [Endpoint/14] [ERROR] core.endpoint.s3.s3_rest:: S3 ERROR <?xml version="1.0" encoding="UTF-8"?><Error>
InternalError<Message>We encountered an internal error. Please try again.</Message><Resource>/s5001b1/s5001o2_GB</Resource><RequestId>l0a0j66s-amgalv-n9a</RequestId></Error> PUT /s5001b1/s5001o2_GB {“host”:“metallb”,“authorization”:“AWS lDT36NrV8XNBMg0p8ESh:DNkEsP3hIfzKYQSH5sqT2iKjKW8=”,“user-agent”:“aws-sdk-java/1.10.76 Linux/4.18.0-240.el8.x86_64 OpenJDK_64-Bit_Server_VM/25.265-b01/1.8.0_265”,“amz-sdk-invocation-id”:“f7e25c94-f1b5-436a-bc64-97c730ead3fa”,“amz-sdk-retry”:“0/0/”,“date”:“Wed, 02 Mar 2022 20:30:02 GMT”,“content-type”:“application/octet-stream”,“content-length”:“3000000000”,“connection”:“Keep-Alive”} TypeError: callback is not a function Mar-2 20:30:07.427 [Endpoint/14] [L0] core.endpoint.s3.ops.s3_put_object:: PUT OBJECT s5001b2 s5001o2_GB
``` root@c83f1-infa internal]# noobaa status INFO[0000] CLI version: 5.9.0 INFO[0000] noobaa-image: noobaa/noobaa-core:5.9_nsfsfixes-20220215 INFO[0000] operator-image: quay.io/rhceph-dev/odf4-mcg-rhel8-operator@sha256:773dbaded46fa0a024c28a92f98f4ad64d370011ed1405f2b16f39b9258eb6b2 INFO[0000] noobaa-db-image: quay.io/rhceph-dev/rhel8-postgresql-12@sha256:da0b8d525b173ef472ff4c71fae60b396f518860d6313c4f3287b844aab6d622 INFO[0000] Namespace: openshift-storage INFO[0000] INFO[0000] CRD Status: INFO[0000] ✅ Exists: CustomResourceDefinition “noobaas.noobaa.io” INFO[0000] ✅ Exists: CustomResourceDefinition “backingstores.noobaa.io” INFO[0000] ✅ Exists: CustomResourceDefinition “namespacestores.noobaa.io” INFO[0000] ✅ Exists: CustomResourceDefinition “bucketclasses.noobaa.io” INFO[0000] ✅ Exists: CustomResourceDefinition “noobaaaccounts.noobaa.io” INFO[0000] ✅ Exists: CustomResourceDefinition “objectbucketclaims.objectbucket.io” INFO[0000] ✅ Exists: CustomResourceDefinition “objectbuckets.objectbucket.io” INFO[0000] INFO[0000] Operator Status: INFO[0000] ✅ Exists: Namespace “openshift-storage” INFO[0000] ✅ Exists: ServiceAccount “noobaa” INFO[0000] ✅ Exists: ServiceAccount “noobaa-endpoint” INFO[0000] ✅ Exists: Role “mcg-operator.v4.9.3-noobaa-7fdbb75fd7” INFO[0000] ✅ Exists: Role “mcg-operator.v4.9.3-noobaa-endpoint-65854bfccb” INFO[0000] ✅ Exists: RoleBinding “mcg-operator.v4.9.3-noobaa-7fdbb75fd7” INFO[0000] ✅ Exists: RoleBinding “mcg-operator.v4.9.3-noobaa-endpoint-65854bfccb” INFO[0000] ✅ Exists: ClusterRole “mcg-operator.v4.9.3-644558fccd” INFO[0000] ✅ Exists: ClusterRoleBinding “mcg-operator.v4.9.3-644558fccd” INFO[0000] ⬛ (Optional) Not Found: ValidatingWebhookConfiguration “admission-validation-webhook” INFO[0000] ⬛ (Optional) Not Found: Secret “admission-webhook-secret” INFO[0000] ⬛ (Optional) Not Found: Service “admission-webhook-service” INFO[0000] ✅ Exists: Deployment “noobaa-operator” INFO[0000] INFO[0000] System Wait Ready: INFO[0000] ✅ System Phase is “Ready”. INFO[0000] INFO[0000] INFO[0000] System Status: INFO[0000] ✅ Exists: NooBaa “noobaa” INFO[0000] ✅ Exists: StatefulSet “noobaa-core” INFO[0000] ✅ Exists: ConfigMap “noobaa-config” INFO[0000] ✅ Exists: Service “noobaa-mgmt” INFO[0000] ✅ Exists: Service “s3” INFO[0000] ✅ Exists: Secret “noobaa-db” INFO[0000] ✅ Exists: ConfigMap “noobaa-postgres-config” INFO[0000] ✅ Exists: ConfigMap “noobaa-postgres-initdb-sh” INFO[0000] ✅ Exists: StatefulSet “noobaa-db-pg” INFO[0000] ✅ Exists: Service “noobaa-db-pg” INFO[0000] ✅ Exists: Secret “noobaa-server” INFO[0000] ✅ Exists: Secret “noobaa-operator” INFO[0000] ✅ Exists: Secret “noobaa-endpoints” INFO[0000] ✅ Exists: Secret “noobaa-admin” INFO[0000] ✅ Exists: StorageClass “openshift-storage.noobaa.io” INFO[0000] ✅ Exists: BucketClass “noobaa-default-bucket-class” INFO[0000] ✅ Exists: Deployment “noobaa-endpoint” INFO[0000] ✅ Exists: HorizontalPodAutoscaler “noobaa-endpoint” INFO[0000] ✅ (Optional) Exists: BackingStore “noobaa-default-backing-store” INFO[0001] ⬛ (Optional) Not Found: CredentialsRequest “noobaa-aws-cloud-creds” INFO[0001] ⬛ (Optional) Not Found: CredentialsRequest “noobaa-azure-cloud-creds” INFO[0001] ⬛ (Optional) Not Found: Secret “noobaa-azure-container-creds” INFO[0001] ⬛ (Optional) Not Found: Secret “noobaa-gcp-bucket-creds” INFO[0001] ⬛ (Optional) Not Found: CredentialsRequest “noobaa-gcp-cloud-creds” INFO[0001] ✅ (Optional) Exists: PrometheusRule “noobaa-prometheus-rules” INFO[0001] ✅ (Optional) Exists: ServiceMonitor “noobaa-mgmt-service-monitor” INFO[0001] ✅ (Optional) Exists: ServiceMonitor “s3-service-monitor” INFO[0001] ✅ (Optional) Exists: Route “noobaa-mgmt” INFO[0001] ✅ (Optional) Exists: Route “s3” INFO[0001] ✅ Exists: PersistentVolumeClaim “db-noobaa-db-pg-0” INFO[0001] ✅ System Phase is “Ready” INFO[0001] ✅ Exists: “noobaa-admin”
#------------------# #- Mgmt Addresses -# #------------------#
ExternalDNS : [https://noobaa-mgmt-openshift-storage.apps.ocp4.pokprv.stglabs.ibm.com] ExternalIP : [] NodePorts : [https://10.28.20.46:32659] InternalDNS : [https://noobaa-mgmt.openshift-storage.svc:443] InternalIP : [https://172.30.179.247:443] PodPorts : [https://10.128.3.154:8443]
#--------------------# #- Mgmt Credentials -# #--------------------#
email : admin@noobaa.io password : BWb0l3hATSA1+BUyJrShrw==
#----------------# #- S3 Addresses -# #----------------#
ExternalDNS : [https://s3-openshift-storage.apps.ocp4.pokprv.stglabs.ibm.com] ExternalIP : [] NodePorts : [https://10.28.20.47:30905 https://10.28.20.46:30905 https://10.28.20.47:30905 https://10.28.20.45:30905 https://10.28.20.45:30905 https://10.28.20.45:30905 https://10.28.20.47:30905 https://10.28.20.45:30905 https://10.28.20.46:30905 https://10.28.20.46:30905 https://10.28.20.47:30905 https://10.28.20.45:30905 https://10.28.20.46:30905 https://10.28.20.45:30905 https://10.28.20.46:30905 https://10.28.20.47:30905 https://10.28.20.46:30905 https://10.28.20.45:30905 https://10.28.20.47:30905 https://10.28.20.45:30905 https://10.28.20.46:30905 https://10.28.20.47:30905 https://10.28.20.47:30905 https://10.28.20.46:30905 https://10.28.20.45:30905 https://10.28.20.45:30905 https://10.28.20.47:30905 https://10.28.20.47:30905 https://10.28.20.46:30905 https://10.28.20.46:30905] InternalDNS : [https://s3.openshift-storage.svc:443] InternalIP : [https://172.30.167.87:443] PodPorts : [https://10.128.1.89:6443 https://10.128.2.173:6443 https://10.128.1.90:6443 https://10.128.4.152:6443 https://10.128.4.154:6443 https://10.128.4.150:6443 https://10.128.1.92:6443 https://10.128.4.147:6443 https://10.128.2.178:6443 https://10.128.2.176:6443 https://10.128.1.93:6443 https://10.128.4.153:6443 https://10.128.2.182:6443 https://10.128.4.148:6443 https://10.128.2.175:6443 https://10.128.1.87:6443 https://10.128.2.177:6443 https://10.128.4.149:6443 https://10.128.1.88:6443 https://10.128.4.155:6443 https://10.128.2.174:6443 https://10.128.1.95:6443 https://10.128.1.86:6443 https://10.128.2.181:6443 https://10.128.4.151:6443 https://10.128.4.146:6443 https://10.128.1.94:6443 https://10.128.1.91:6443 https://10.128.2.179:6443 https://10.128.2.180:6443]
#------------------# #- S3 Credentials -# #------------------#
AWS_ACCESS_KEY_ID : xbtyDghqOmoE3RDoWULe AWS_SECRET_ACCESS_KEY : q9BE7S7KTVZX8N/A5LB9sh9Kv1dSOB/d1P2QjEwg
#------------------# #- Backing Stores -# #------------------#
NAME TYPE TARGET-BUCKET PHASE AGE noobaa-default-backing-store pv-pool Ready 6d1h28m2s
#--------------------# #- Namespace Stores -# #--------------------#
NAME TYPE TARGET-BUCKET PHASE AGE noobaa-s3res-3777249441 nsfs Ready 6d1h27m56s
#------------------# #- Bucket Classes -# #------------------#
NAME PLACEMENT NAMESPACE-POLICY QUOTA PHASE AGE noobaa-default-bucket-class {“tiers”:[{“backingStores”:[“noobaa-default-backing-store”]}]} null null Ready 6d1h28m2s
#-------------------# #- NooBaa Accounts -# #-------------------#
No noobaa accounts found.
#-----------------# #- Bucket Claims -# #-----------------#
No OBCs found.
Must gathers are in box note https://ibm.ent.box.com/folder/145794528783?s=uueh7fp424vxs2bt4ndrnvh7uusgu6tocd Could not get oc adm must-gather --image=registry.redhat.io/ocs4/ocs-must-gather-rhel8 b/c of this error: must-gather-j669f] POD 2022-03-02T21:43:59.287340940Z collecting dump cephblockpools [must-gather-j669f] POD 2022-03-02T21:43:59.508140313Z collecting dump cephclusters [must-gather-j669f] POD 2022-03-02T21:43:59.738039077Z collecting dump cephfilesystems [must-gather-j669f] POD 2022-03-02T21:43:59.978852540Z collecting dump cephobjectstores [must-gather-j669f] POD 2022-03-02T21:44:00.207898724Z collecting dump cephobjectstoreusers [must-gather-j669f] POD 2022-03-02T21:44:01.724486444Z Error from server (NotFound): pods “must-gather-j669f-helper” not found [must-gather-j669f] POD 2022-03-02T21:44:01.738185148Z waiting for helper pod to come up in openshift-storage namespace. Retrying 1 [must-gather-j669f] POD 2022-03-02T21:44:06.926669468Z Error from server (NotFound): pods “must-gather-j669f-helper” not found [must-gather-j669f] POD 2022-03-02T21:44:06.961050391Z waiting for helper pod to come up in openshift-storage namespace. Retrying 2 [must-gather-j669f] POD 2022-03-02T21:44:12.175332535Z Error from server (NotFound): pods "must-gather-j66
Your environment
[root@c83f1-infa internal]# oc get csv NAME DISPLAY VERSION REPLACES PHASE mcg-operator.v4.9.3 NooBaa Operator 4.9.3 mcg-operator.v4.9.2 Succeeded ocs-operator.v4.9.3 OpenShift Container Storage 4.9.3 ocs-operator.v4.9.2 Succeeded odf-operator.v4.9.3 OpenShift Data Foundation 4.9.3 odf-operator.v4.9.2 Succeeded [root@c83f1-infa internal]#
POK Bare Metal, Real Fast cluster 30 enpoint pods Md5 disabled Default CPU/memory on endpoint pods 1 account being tested even though there are 100 accounts. There are 10 buckets for this account 128 workers in Cosbench 4 app nodes but looks like only 2 were used
Steps to reproduce
Cosbench xml
<?xml version="1.0" encoding="UTF-8" ?> <workload name="GBrangesizeobj" description="GB range read and write workload"> <storage type="s3" config="accesskey=lDT36NrV8XNBMg0p8ESh;secretkey=TZDzPCqqbBhjs1nzQX0Y0fK27F2ANCTP81iQjjqj;endpoint=http://metallb:80;path_style_access=true" /> <workflow> <workstage name="prepare_GB"> <work name="prepare_GB" workers="128" interval="10" rampup="0" type="prepare" ratio="100" config="cprefix=s5001b;oprefix=s5001o;osuffix=_GB;containers=r(1,4);objects=r(1,5);sizes=r(1,3)GB"/> </workstage> <workstage name="rangewritereaddelete"> <work name="GBrangewritereaddelete" workers="128" interval="10" rampup="0" runtime="1800"> <operation type="write" ratio="20" config="cprefix=s5001b;oprefix=s5001o;osuffix=_GB;containers=r(1,4);objects=r(7,9);sizes=r(1,3)GB"/> <operation type="read" ratio="60" config="cprefix=s5001b;oprefix=s5001o;osuffix=_GB;containers=r(1,4);objects=r(1,3)"/> <operation type="delete" ratio="20" config="cprefix=s5001b;oprefix=s5001o;osuffix=_GB;containers=r(1,4);objects=r(4,6)"/> </work> </workstage> <workstage name="cleanup_GB"> <work name="cleanup_GB" workers="128" interval="10" rampup="0" type="cleanup" ratio="100" config="cprefix=s5001b;oprefix=s5001o;osuffix=_GB;containers=r(1,4);objects=r(1,9)"/> </workstage> </workflow> </workload>Expected behaviour
Should be able to write objects consistently
Actual behavior
Expected behavior
Steps to reproduce
More information - Screenshots / Logs / Other output
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 30 (12 by maintainers)
I’ve confirmed that this defect is indeed fixed by
6896 verification: Launch cosbench:
Check noobaa endpoint logs
Look at cosbench logs:
I tested 2 more times just to be sure and did not see the signature of this defect.
I do not know the process for closing defects in this team. Do I wait until it is in an ODF build?? Or is patch verification sufficient?
@romayalon @MonicaLemay wow, what a great work on this issue! I would add it to the pantheon of issues as a “higgs-bugsons” from the good old list of unusual software bugs.
@nimrod-becker and @akmithal will be able to advise on closing.
Hi Monica, I think it really helped, the file system error is still EINVAL and according to https://linux.die.net/man/2/writev -
And from logs:
buffers_len=1315 I think we passed the maximum because according to man - On Linux, the limit advertised by these mechanisms is 1024