graph-node: [Bug] Cross shard graft never starts

Bug report

I have a subgraph deployed in my primary shard. A new subgraph was deployed into my secondary shard which grafts off the first subgraph. The copy operation for the graft never starts and no relevant logs are printed as far as I can tell. When I check graphman copy list I see the following:

deployment           | QmR4EG5LiCR1zjHFw1x8o6ExkUbYKa1GMNRWyBLzapj9ED
action               | sgd18355 -> sgd19620 (second)
started              | 2023-06-26T12:58:00+00:00
progress             | 0.00% done, 0/1532

I’ve checked my replication slot lag and I have very little lag.

I’m running v0.31.0.

Relevant log output

EDIT: I reproduced this in a staging environment after trying to reproduce locally.

Jun 27 10:51:46.548 INFO Received subgraph_create request, params: SubgraphCreateParams { name: SubgraphName("clje6444w0igt49ri4v9w8skj") }, component: JsonRpcServer
Jun 27 10:51:46.552 DEBG Created subgraph, subgraph_name: clje6444w0igt49ri4v9w8skj, component: SubgraphRegistrar
Jun 27 10:51:46.556 INFO Received subgraph_deploy request, params: SubgraphDeployParams { name: SubgraphName("clje6444w0igt49ri4v9w8skj"), ipfs_hash: DeploymentHash("QmTHV6jX365VjoZf6g19bNrXR2AfPEQgqtw474hDXRZk5s"), node_id: None, debug_fork: None, history_blocks: None }, component: JsonRpcServer
Jun 27 10:51:46.568 DEBG Connecting to firehose to retrieve block for number 17570214, sgd: 0, subgraph_id: QmTHV6jX365VjoZf6g19bNrXR2AfPEQgqtw474hDXRZk5s, component: SubgraphRegistrar
Jun 27 10:51:46.630 DEBG Retrieving block(s) from firehose, sgd: 0, subgraph_id: QmTHV6jX365VjoZf6g19bNrXR2AfPEQgqtw474hDXRZk5s, component: SubgraphRegistrar
Jun 27 10:51:46.710 DEBG Connecting to firehose to retrieve block for number 17570229, sgd: 0, subgraph_id: QmTHV6jX365VjoZf6g19bNrXR2AfPEQgqtw474hDXRZk5s, component: SubgraphRegistrar
Jun 27 10:51:46.779 DEBG Retrieving block(s) from firehose, sgd: 0, subgraph_id: QmTHV6jX365VjoZf6g19bNrXR2AfPEQgqtw474hDXRZk5s, component: SubgraphRegistrar
Jun 27 10:51:46.847 INFO Set subgraph start block, block: Some(#17570214 (60b66b5fbaaad3d91ab1a67685812c4ed01cfeb154af11f0b252d72af32ed8b9)), sgd: 0, subgraph_id: QmTHV6jX365VjoZf6g19bNrXR2AfPEQgqtw474hDXRZk5s, component: SubgraphRegistrar
Jun 27 10:51:46.847 INFO Graft base, block: Some(17570229), base: Some("QmeQ4598pGHzTUb7vfayE47zLsLM6BAs4jXJLUF9PtXkgv"), sgd: 0, subgraph_id: QmTHV6jX365VjoZf6g19bNrXR2AfPEQgqtw474hDXRZk5s, component: SubgraphRegistrar
Jun 27 10:51:47.015 DEBG Wrote new subgraph version to store, subgraph_hash: QmTHV6jX365VjoZf6g19bNrXR2AfPEQgqtw474hDXRZk5s, subgraph_name: clje6444w0igt49ri4v9w8skj, sgd: 0, subgraph_id: QmTHV6jX365VjoZf6g19bNrXR2AfPEQgqtw474hDXRZk5s, component: SubgraphRegistrar
Jun 27 10:51:47.517 DEBG Deployment assignee is this node, broadcasting add event, node_id: indexer_0, assigned_to: indexer_0, component: SubgraphRegistrar
Jun 27 10:51:47.517 DEBG Received assignment event: Add { deployment: DeploymentLocator { id: DeploymentId(1753), hash: DeploymentHash("QmTHV6jX365VjoZf6g19bNrXR2AfPEQgqtw474hDXRZk5s") }, node_id: NodeId("indexer_0") }, component: SubgraphRegistrar
Jun 27 10:51:47.517 DEBG Subgraph started, start_ms: 0, sgd: 1753, subgraph_id: QmTHV6jX365VjoZf6g19bNrXR2AfPEQgqtw474hDXRZk5s, component: SubgraphRegistrar
Jun 27 10:51:47.521 INFO Resolve subgraph files using IPFS, n_templates: 0, n_data_sources: 1, sgd: 1753, subgraph_id: QmTHV6jX365VjoZf6g19bNrXR2AfPEQgqtw474hDXRZk5s, component: SubgraphInstanceManager
Jun 27 10:51:47.522 INFO Successfully resolved subgraph files using IPFS, features: grafting, n_templates: 0, n_data_sources: 1, sgd: 1753, subgraph_id: QmTHV6jX365VjoZf6g19bNrXR2AfPEQgqtw474hDXRZk5s, component: SubgraphInstanceManager
Jun 27 10:51:47.526 INFO Starting subgraph writer, queue_size: 5, sgd: 1753, subgraph_id: QmTHV6jX365VjoZf6g19bNrXR2AfPEQgqtw474hDXRZk5s, component: SubgraphInstanceManager
Jun 27 10:51:47.537 INFO Initializing graft by copying data from sgd1752 to sgd1753, sgd: 1753, subgraph_id: QmTHV6jX365VjoZf6g19bNrXR2AfPEQgqtw474hDXRZk5s, component: SubgraphInstanceManager
Jun 27 10:51:47.541 INFO Obtaining copy lock (this might take a long time if another process is still copying), dst: sgd1753, sgd: 1753, subgraph_id: QmTHV6jX365VjoZf6g19bNrXR2AfPEQgqtw474hDXRZk5s, component: SubgraphInstanceManager
Jun 27 10:51:47.606 INFO Initialize data copy from QmeQ4598pGHzTUb7vfayE47zLsLM6BAs4jXJLUF9PtXkgv[sgd1752] to QmTHV6jX365VjoZf6g19bNrXR2AfPEQgqtw474hDXRZk5s[sgd1753], dst: sgd1753, sgd: 1753, subgraph_id: QmTHV6jX365VjoZf6g19bNrXR2AfPEQgqtw474hDXRZk5s, component: SubgraphInstanceManager
Jun 27 10:51:47.607 ERRO Failed to start subgraph, code: SubgraphStartFailure, error: store error: relation "primary_public.active_copies" does not exist, sgd: 1753, subgraph_id: QmTHV6jX365VjoZf6g19bNrXR2AfPEQgqtw474hDXRZk5s, component: SubgraphInstanceManager

IPFS hash

No response

Subgraph name or link to explorer

No response

Some information to help us out

  • Tick this box if this bug is caused by a regression found in the latest release.
  • Tick this box if this bug is specific to the hosted service.
  • I have searched the issue tracker to make sure this issue is not a duplicate.

OS information

None

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Reactions: 6
  • Comments: 16

Commits related to this issue

Most upvoted comments

And the workaround while the issue is resolved at the graph node level is to run the following sql queries in database backing the second shard:

create schema primary_public;

create foreign table primary_public.active_copies
    (
        src integer,
        dst integer,
        queued_at timestamp with time zone,
        cancelled_at timestamp with time zone
        )
    server shard_primary
    options (schema_name 'public');

create foreign table primary_public.chains
    (
        id integer,
        name text,
        net_version text,
        genesis_block_hash text,
        shard text,
        namespace text
        )
    server shard_primary
    options (schema_name 'public');

create foreign table primary_public.deployment_schemas
    (
        id integer,
        subgraph varchar,
        name varchar,
        version integer,
        shard text,
        network text,
        active boolean,
        created_at timestamp with time zone
        )
    server shard_primary
    options (schema_name 'public');

@paymog You are right - that initial setup was missing from the code. I just opened a PR that fixes that.

I am glad it works now. You only need superuser privileges to create the extensions, not for using them. It should therefore be fine to either create the extension as a more privileged user (e.g., postgres) or to drop superuser privileges again after creating it. This privilege issue is the reason why graph-node does not create them itself.