playground: Can't complete `docker_compose` quickstart guide

Greetings,

I am trying to “play” with Tinkerbell to provision bare metal servers with OSes (e.g with Ubuntu Focal). I am following https://github.com/tinkerbell/sandbox/blob/main/docs/quickstarts/COMPOSE.md and I haven’t been able to finish the provisioning steps -I’ve re-tried the steps 5 times in a row, and saw 2 different outcomes-. On my first 3 tries, Boots can’t recognize the DHCP request and logs the info written below. On my 4th try, it picked it up, but the workflow’s action state stayed/stuck in STATE_PENDING state -I’ve left it like that for 2 hours, and I think that is long enough time for it to at least start working-. Then I tried it one more time, and it didn’t get pick up by Boots like the first 3 tries.

Any recommendation/tips are welcome. If you have more known ways of making Tinkerbell work -a bare metal server provisioning abother one with an OS-, I am also willing to give a try to them.

Expected Behaviour

I am expecting to see similar outcome for the steps described in https://github.com/tinkerbell/sandbox/blob/main/docs/quickstarts/COMPOSE.md

Current Behaviour

Either Boots doesn’t pick up the machine, or it gets picked up but the workflow stays in 0%, STATE_ENDING stage.

  • Boots doesn’t recognize the DHCP request. For # echo $TINKERBELL_CLIENT_MAC e4:43:4b:3d:75:b8, I encountered the following output in Boots logs and machine doesn’t picked up by Tinkerbell stack;
boots_1                     | {"level":"info","ts":1649949865.6316814,"caller":"dhcp4-go@v0.0.0-20190402165401-39c137f31ad3/handler.go:105","msg":"","service":"github.com/ti
nkerbell/boots","pkg":"dhcp","pkg":"dhcp","event":"recv","mac":"e4:43:4b:3d:75:b8","via":"0.0.0.0","iface":"eno1","xid":"\"4b:3d:75:b8\"","type":"DHCPDISCOVER","secs":28}   boots_1                     | {"level":"info","ts":1649949865.6318014,"caller":"boots/dhcp.go:78","msg":"parsed option82/circuitid","service":"github.com/tinkerbell/boots","
pkg":"main","mac":"e4:43:4b:3d:75:b8","circuitID":""}                                                                                                                        boots_1                     | {"level":"info","ts":1649949865.6336043,"caller":"boots/dhcp.go:91","msg":"retrieved job is empty","service":"github.com/tinkerbell/boots","pkg
":"main","type":"DHCPDISCOVER","mac":"e4:43:4b:3d:75:b8","err":"discover from dhcp message: get hardware by mac from tink: rpc error: code = Unknown desc = SELECT: sql: no rows in result set","errVerbose":"rpc error: code = Unknown desc = SELECT: sql: no rows in result set\nget hardware by mac from tink\ngithub.com/tinkerbell/boots/packet.(*client).DiscoverHardwareFromDHCP\n\t/opt/actions-runner/_work/boots/boots/packet/endpoints.go:108\ngithub.com/tinkerbell/boots/job.discoverHardwareFromDHCP.func1\n\t/opt/actions-runner/_work/boots/boots/job/fetch.go:17\ngithub.com/golang/groupcache/singleflight.(*Group).Do\n\t/home/github/go/pkg/mod/github.com/golang/groupcache@v0.0.0-20190702054246-869f871628b6/singleflight/singleflight.go:56\ngithub.com/tinkerbell/boots/job.discoverHardwareFromDHCP\n\t/opt/actions-runner/_work/boots/boots/job/fetch.go:19\ngithub.com/tinkerbell/boots/job.CreateFromDHCP\n\t/opt/actions-runner/_work/boots/boots/job/job.go:106\nmain.dhcpHandler.serveDHCP\n\t/opt/actions-runner/_work/boots/boots/cmd/boots/dhcp.go:89\nmain.dhcpHandler.ServeDHCP.func1\n\t/opt/actions-runner/_work/boots/boots/cmd/boots/dhcp.go:50\ngithub.com/gammazero/workerpool.startWorker\n\t/home/github/go/pkg/mod/github.com/gammazero/workerpool@v0.0.0-20200311205957-7b00833861c6/workerpool.go:218\nruntime.goexit\n\t/opt/actions-runner/_work/_tool/go/1.16.3/x64/src/runtime/asm_amd64.s:1371\ndiscover from dhcp message"}
  • Workflow doesn’t progress. On my 4th try, the machine got picked up by Boots, but then the workflow got “stuck”, and it was visible during Step 6 of the linked guide above;
Every 1.0s: tink workflow events c263defc-c0b1-11ec-9ab9-0242ac130006; tink workflow state c263defc-c0b1-11ec-9ab9-0242ac130006 

+-----------+-----------+-------------+----------------+---------+---------------+
| WORKER ID | TASK NAME | ACTION NAME | EXECUTION TIME | MESSAGE | ACTION STATUS |
+-----------+-----------+-------------+----------------+---------+---------------+
+-----------+-----------+-------------+----------------+---------+---------------+
+----------------------+--------------------------------------+
| FIELD NAME           | VALUES                               |
+----------------------+--------------------------------------+
| Workflow ID          | c263defc-c0b1-11ec-9ab9-0242ac130006 |
| Workflow Progress    | 0%                                   |
| Current Task         |                                      |
| Current Action       |                                      |
| Current Worker       |                                      |
| Current Action State | STATE_PENDING                        |
+----------------------+--------------------------------------+

On the KVM screen, this screenshot was visible -on the 4th run when Boots was able to pick up the server’s request-, and during that 2 hours it didn’t change;

image

Possible Solution

N/A

Steps to Reproduce (for bugs)

  1. Follow the https://github.com/tinkerbell/sandbox/blob/main/docs/quickstarts/COMPOSE.md guide.
  2. Process gets stuck on Step 6, either Boots can’t recognize the DHCP request being sent by the defined MAC address in Step 3 or it gets picked up, but the workflow doesn’t progress and provision the machine.

Context

Your Environment

  • Operating System and version (e.g. Linux, Windows, MacOS):

Ubuntu 20.04.4 LTS

  • How are you running Tinkerbell? Using Vagrant & VirtualBox, Vagrant & Libvirt, on Packet using Terraform, or give details:

Bare metal provisioner, trying to provision another bare metal server with docker-compose method outlined in https://github.com/tinkerbell/sandbox/blob/main/docs/quickstarts/COMPOSE.md

  • Link to your project or a code example to reproduce issue:

N/A

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 3
  • Comments: 17

Commits related to this issue

Most upvoted comments

Hmm, can’t tell why/where the 2nd attempt hangs at. Check logs of tink-docker again.

ca.pem is in deploy/compose/state/webroot/workflow after the initial provisioning - only delete it for a fresh start, not a subsequent deployment

Ok, the “full” change to make my sandbox-vagrant-vbox setup going is with these hashes and a change on the Boots cmdline, since they changed somewhen around that. Might be more than needed, but “it works for me”. https://gist.github.com/double-p/ea7597da76956fac6f90251ad2e2f175