azure-sdk-for-java: [BUG] AzureResourceManager throws NullPointerException in context of ContainerGroup
Describe the bug When using many Azure container instances with private IP address, then the AzureResourceManager throws time to time NullPointerException in context of ContainerGroup.
Exception or Stack Trace
2022-02-11 09:01:42.674+0000 [id=74] WARNING c.m.j.c.aci.AciCloud#lambda$provision$1: AciCloud: Provision agent test-private-n4s69 failed: java.lang.NullPointerException
java.lang.NullPointerException
at com.azure.resourcemanager.containerinstance.implementation.ContainerGroupImpl.initializeChildrenFromInner(ContainerGroupImpl.java:217)
at com.azure.resourcemanager.resources.fluentcore.arm.models.implementation.GroupableParentResourceImpl.<init>(GroupableParentResourceImpl.java:32)
at com.azure.resourcemanager.containerinstance.implementation.ContainerGroupImpl.<init>(ContainerGroupImpl.java:73)
at com.azure.resourcemanager.containerinstance.implementation.ContainerGroupsImpl.wrapModel(ContainerGroupsImpl.java:43)
at com.azure.resourcemanager.containerinstance.implementation.ContainerGroupsImpl.wrapModel(ContainerGroupsImpl.java:24)
at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:113)
at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1816)
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:151)
at reactor.core.publisher.FluxContextWrite$ContextWriteSubscriber.onNext(FluxContextWrite.java:107)
at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1816)
at reactor.core.publisher.MonoFlatMap$FlatMapInner.onNext(MonoFlatMap.java:249)
at reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber.onNext(FluxSwitchIfEmpty.java:74)
at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1816)
at reactor.core.publisher.MonoFlatMap$FlatMapInner.onNext(MonoFlatMap.java:249)
at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1816)
at reactor.core.publisher.MonoFlatMap$FlatMapInner.onNext(MonoFlatMap.java:249)
at reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2398)
at reactor.core.publisher.MonoFlatMap$FlatMapInner.onSubscribe(MonoFlatMap.java:238)
at reactor.core.publisher.MonoJust.subscribe(MonoJust.java:55)
To Reproduce Steps to reproduce the behavior: see issue https://github.com/jenkinsci/azure-container-agents-plugin/issues/101
Code Snippet see
- https://github.com/jenkinsci/azure-container-agents-plugin/blob/a22f04771da02f714abab78022efdb38c8bc1fe3/src/main/java/com/microsoft/jenkins/containeragents/aci/AciCloud.java#L201
- https://github.com/jenkinsci/azure-container-agents-plugin/blob/a22f04771da02f714abab78022efdb38c8bc1fe3/src/main/java/com/microsoft/jenkins/containeragents/aci/AciCloud.java#L177
Expected behavior No NullPointerException is thrown.
Setup (please complete the following information):
- OS: Ubuntu 18.04
- Library/Libraries: com.azure.resourcemanager:azure-resourcemanager-containerinstance:2.9.0
- Java version: 11
- Frameworks: Jenkins Plugin Azure Container Agent
Information Checklist Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report
- Bug Description Added
- Repro Steps Added
- Setup information Added
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 16 (10 by maintainers)
@XiaofeiCao Thank you for mentioning me. I opened a PR in https://github.com/jenkinsci/azure-sdk-plugin/. After releasing this plugin, I can test it in the original plugin.
@sparsick Thanks, the log is great! I can tell from your log that the templates are actually the same. I’ll add some protections for this situation. You can expect it to be fixed in the next sdk release.
In the meanwhile, if you want, you can add a
protocolto the container port in your template. I believe this will also resolve this issue.@XiaofeiCao I generate a new log output where the deployment template is also logged.
The whole log output when starting 4 agents at the same time based on the same deployment template.
@timja I hear you. I doubt that empty ports will result in NPE…
However, I tried @sparsick 's armTemplate and to find
protocolto be empty in the createdContainerGroup… I also did a few experiments and it seems that deployment service ignores theprotocolproperty when theContainerGrouphas aContainerthat has the same port with that inIpAddressbut withoutprotocol, as in the provided template sample. Not sure why they did that.That said, for quick fix, you can add a protocol here so that deployment service will not ignore it.
Or you can wait for next released sdk, which will have NPE fixed in this situation.
@sparsick feel free to just add that to the plugin in general, it can be useful for debugging, see pattern in the virtual machines plugin: https://github.com/jenkinsci/azure-vm-agents-plugin/blob/master/src/main/java/com/microsoft/azure/vmagent/AzureVMManagementServiceDelegate.java#L595-L599
This is the template file: https://github.com/jenkinsci/azure-container-agents-plugin/blob/master/src/main/resources/com/microsoft/jenkins/containeragents/aci/deployTemplate.json
Seems the values are set here: https://github.com/jenkinsci/azure-container-agents-plugin/blob/f18b5bf95fe69e0bc24441065ebb214473bde8f0/src/main/java/com/microsoft/jenkins/containeragents/builders/AciDeploymentTemplateBuilder.java#L134-L148
I wonder if the port is empty in some cases
@sparsick Thanks for your template sample and yes it’s actually setting the protocol…
Since you mentioned that the NPE is from time to time and the templates are constructed dynamically, can it be that some of them are without
protocol?To be sure whether this is the case, a log for the actual ARM template used here before the NPE will be of great help!