service-fabric: Copy-ServiceFabricApplicationPackage to image store never finishes

I’m unable to deploy Service Fabric application to Azure Service Fabric cluster; specifically to copy the deployment package to image store using Powershell scripts from Service Fabric SDK.

Azure Service Fabric cluster: New single node cluster (VM is Ubuntu 16.04) Service Fabric version: 6.4.639.1 Nothing has yet been deployed on it. Both Azure blade and Service Fabric Explorer shows node as green / ready and no ongoing upgrades.

Client machine: Microsoft Azure Service Fabric SDK 3.3.654 Using Powershell to deploy the app: C:\Program Files\Microsoft SDKs\Service Fabric\Tools\PSModule\ServiceFabricSDK\Publish-NewServiceFabricApplication.ps1

Copy-ServiceFabricApplicationPackage (L:247) hangs and never proceeds any further beyond several K transfered. I’ve added -ShowProgress -ShowProgressIntervalMilliseconds 3000 to see more details. It shows that it transfered some bytes and than stops (usually 2-4K).

So far I’ve tried (on client machine):

  • Turning off firewall completely.
  • Removing local development cluster and stopping FabricHostSvc, because of the suggestions in microsoft/service-fabric-issues#813

I don’t know how to SSH into the VM yet, but once I’ll found out I will add any relevant details. I’ve also quickly browsed through cluster logs which are shoveled to Azure Table Storage, but so far did not find anything interesting or pertaining to image store.

About this issue

Most upvoted comments

Copy-ServiceFabricApplicationPackage -ApplicationPackagePath $path -CompressPackage -SkipCopy

Copy-ServiceFabricApplicationPackage -ApplicationPackagePath $path -ApplicationPackagePathInImageStore MyAppV1

I’m using Powershell directly to package and compress the app as above.

Its been stuck on the second statement for about 20 minutes now trying to upload a 175mb package.

Any ideas how I can check the progress of the upload?

I’m pretty confident this is not environmental as it does the same on my two dev environments. I’ve even tried using a different internet connection in case it was my broadband supplier. But with no luck.

I can still administer the cluster just fine through Powershell and use my SF applications that are already running on the cluster.

It just seems to not want to upload an application package at very random times. Very very frustrating.

@dkkapur hey Deep, can this issue get some attention please?

Not being able to copy an application package to the cluster reliably is falling at the first hurdle. Potential Service Fabric customers are likely to loose confidence in the tech and abandon it, which is a real shame!

@nates321 @manu-amiel There is another way to copy an application package to the cluster. As documented here. But the documentation is weak, so you need this extra bit of info from this comment

@dkkapur I actually prefer this pull model, would it make sense to look at the suitability of this being the primary advocated method of copying application packages to a cluster? Sure its more steps than the client directly copying the application package, but a pull model feels better suited.

I want to bump this. When trying to deploy applications to a cluster (we have a couple clusters where this happens), this command fails around 1/3 of the times for one of the packages. Normally it takes around 1 to 2 minutes for this package to upload, but when it fails it times out after the 10 minute timeout we set. Also, sometimes the command doesn’t actually respect timeout we set and runs until our vsts build timeout is reached.

Some useful information I’ve gathered from Azure support. In the end it was all inconclusive, but it might help you anyway.

From analysis done by someone from SFC product group:

The trace shows the error FABRIC_E_GATEWAY_NOT_REACHABLE around the time reported in incident. This error could be resulted by high load in IPC transport between fabricgateway and fabric.

I was not able to get a clear info what exactly is meant by “fabricgateway”, because I thought everything is hosted by our Azure VMs and if we are not doing anything and the cluster is empty, how could there be “high load in IPC”? Perhaps there are sill some shared infrastructure components…

To reduce the load in the same transport, 6.5 RTO (completed but not released yet) will have an improvement by routing file transfer to another channel. The next version would address this issue

So the PG is doing something about that in next release, which is scheduled (unconfirmed) for end of May.

Also, when reporting the issue it is important to try to capture network traffic from your side using tools like Network Monitor or Wireshark. This is something you have to plan ahead and be prepared to do when the issue arises.

We have decided at the end to abandon SFC apps/hosting, so I will not be following this issue anymore.