bacalhau: Bacalhau run stuck on 'Finding node(s) for the job'
Context
When attempting to run a docker image on Bacalhau, it gets stuck on the ‘Finding node(s) step.’ Is this a temporary outage or is there a different issue?
Steps to Reproduce
bacalhau docker run -v bafybeiblcnj6z4pkqmfxi7jxjvkaxue2kw5xxsfhdzwyjfe23vnhvukr7y:/project/inputs jsolly/segmentation_testbed
It’s been going for over 20 mins now. I will let it go overnight and see what happens 🤷♂️
Thoughts
I haven’t tried other images. I am on Arm64 architecture, but I built the image using:
docker buildx build --platform linux/amd64 -t segmentation_testbed .
and I confirmed the architecture in DockerHub.
Environment
Client Version: v0.3.22 Server Version: v0.3.22 Link to repository -> Segmentation Testbed
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 17 (4 by maintainers)
The graph size is correlated with but not exactly the size of the data (you could have a lot of tiny blocks). In any case, this isn’t an issue with the new API at all, and we no longer report “pinning status” in the new API (since it has minimal benefit to report as a hosted service provider, and is really unscalable to report).
But I was mistaken that that was the issue - for the upload corresponding to
bafybeihu2bvhibtb4nv6mhxdkeu5ubwu5wz74n5iyfzgao4gbrzkivxr7y
, it looks like that just upload failed (we only have ~2MB associated with that CID). If you just reupload, it should work fineFor
I think this is a Kubo/Bacalhau thing - unfortunately I don’t think we or any other IPFS provider / pinning service can help. (maybe @wesfloyd has an idea)
In any case, if you do try out other hosted IPFS providers, would be curious about your experience there - we’ve designed web3.storage specifically to avoid the scale problems you see with other pinning providers (have a blog post on this here https://blog.web3.storage/posts/web3-storage-architecture). Of course, there still can be annoying UX kinks as we try to move the community forward to a more scalable/performant place (talk less about pinning, more about CAR files), but think we’re on the right track.
there seem to be an issue with our latest cli where it defaults to our development endpoint instead of production. Setting
export BACALHAU_ENVIRONMENT=production
before running the job would solve the issue for now. I am working on a better fix.