bacalhau: Bacalhau run stuck on 'Finding node(s) for the job'

Context

When attempting to run a docker image on Bacalhau, it gets stuck on the ‘Finding node(s) step.’ Is this a temporary outage or is there a different issue?

Steps to Reproduce

bacalhau docker run -v bafybeiblcnj6z4pkqmfxi7jxjvkaxue2kw5xxsfhdzwyjfe23vnhvukr7y:/project/inputs jsolly/segmentation_testbed

image It’s been going for over 20 mins now. I will let it go overnight and see what happens 🤷‍♂️

Thoughts

I haven’t tried other images. I am on Arm64 architecture, but I built the image using:

docker buildx build --platform linux/amd64 -t segmentation_testbed .

and I confirmed the architecture in DockerHub.

Environment

Client Version: v0.3.22 Server Version: v0.3.22 Link to repository -> Segmentation Testbed

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Comments: 17 (4 by maintainers)

Most upvoted comments

The graph size is correlated with but not exactly the size of the data (you could have a lot of tiny blocks). In any case, this isn’t an issue with the new API at all, and we no longer report “pinning status” in the new API (since it has minimal benefit to report as a hosted service provider, and is really unscalable to report).

But I was mistaken that that was the issue - for the upload corresponding to bafybeihu2bvhibtb4nv6mhxdkeu5ubwu5wz74n5iyfzgao4gbrzkivxr7y, it looks like that just upload failed (we only have ~2MB associated with that CID). If you just reupload, it should work fine

For

And when trying ipfs get bafybeiblcnj6z4pkqmfxi7jxjvkaxue2kw5xxsfhdzwyjfe23vnhvukr7y I ran into an error

I think this is a Kubo/Bacalhau thing - unfortunately I don’t think we or any other IPFS provider / pinning service can help. (maybe @wesfloyd has an idea)

In any case, if you do try out other hosted IPFS providers, would be curious about your experience there - we’ve designed web3.storage specifically to avoid the scale problems you see with other pinning providers (have a blog post on this here https://blog.web3.storage/posts/web3-storage-architecture). Of course, there still can be annoying UX kinks as we try to move the community forward to a more scalable/performant place (talk less about pinning, more about CAR files), but think we’re on the right track.

there seem to be an issue with our latest cli where it defaults to our development endpoint instead of production. Setting export BACALHAU_ENVIRONMENT=production before running the job would solve the issue for now. I am working on a better fix.