gitness: Server<->Agent reliability causing new builds to stay Pending

I’ve been trying to get to the bottom of this for a while (with no success). This is probably also related to #2246 and there have been a few discussions on the Discourse forums. I am running the following setup via Docker:

version: "3.3"

services:
  drone-server:
    image: drone/drone:latest
    environment:
      - DRONE_DEBUG=true
      - DRONE_OPEN=true
      - DRONE_HOST=https://ci.mydomain.com
      - DRONE_GOGS=true
      - DRONE_GOGS_PRIVATE_MODE=true
      - DRONE_GOGS_URL=https://git.mydomain.com  # running Gogits (0.11.29.0727)
      - DRONE_SECRET=###
      - DRONE_ADMIN=###
    networks:
      - drone
      - traefik
    volumes:
      - dronedata:/var/lib/drone
    deploy:
      placement:
        constraints:
          - "node.hostname == node1.mydomain.com"
      labels:
        - "traefik.enable=true"
        - "traefik.port=8000"
        - "traefik.backend=ci"
        - "traefik.docker.network=traefik"
        - "traefik.frontend.rule=Host:ci.mydomain.com"
      restart_policy:
        condition: on-failure
      replicas: 1

  drone-agent:
    image: drone/agent:latest
    command: agent
    networks:
      - drone
      - bridge
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    environment:
      - DRONE_DEBUG=true
      - DRONE_SERVER=drone-server:9000
      - DRONE_SECRET=###
    deploy:
      placement:
        constraints:
          - "node.role != manager"
      restart_policy:
        condition: on-failure
      replicas: 3

networks:
  drone:
    driver: overlay
  bridge:
    external: true
  traefik:
    external: true

volumes:
  dronedata:
    external: true

This is deploy with docker stack in a Swam Mode cluster:

$ docker stack deploy -c ci.yml ci

This works fine for a little while (~5-10mins?) and later when I think about pushing new commits to Gogits and want a CI build; I have to:

$ docker stack rm ci
$ docker stack deploy -c ci.yml ci

Effectively tearing down the entire CI and rebuilding it (except its local data).

To date I’ve not been able to gleam anything useful from the debug logs of either the server nor the agent(s).

This is the output of drone build list:

$ drone build list prologic/eris
Build #24
Status: pending
Event: push
Commit: 3a9d1fefc834d20991b784993589ed2cc558289c
Branch: master
Ref: refs/heads/master
Author: prologic <james@mydomain.com>
Message: Update README.md

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 16

Most upvoted comments

@JaredReisinger same for me. Not adding it to both still caused dropouts for me.

I had the same issue for my swarm deployment, but since I have not been able to gather any useful debugging information I deployed drone via plain docker-compose instead of a stack 😦

the main difference between your sample configuration, and the default configuration being used by the majority of production installations, is the overlay network. I am finding many open issues in the moby issue tracker for overlay networks, including some related to GRPC. So this seems like the best place to start.