google-cloud-go: BigQuery: 404 table not found even when the table exists

Client

BigQuery Go Client

Describe Your Environment

Linux 4.15.11-1-ARCH SMP PREEMPT x86_64 GNU/Linux go version go1.10.1 linux/amd64

Expected Behavior

I am trying to stream into BigQuery.

It should just insert the data without 404 table not found error since the table already exists.

Actual Behavior

404 table not found

Before running go run the package was updated with go get -u cloud.google.com/go/bigquery

go run checkTable.go
2018/04/19 14:38:20 googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable14, notFound
2018/04/19 14:38:20 Creating table too
panic: googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable14, notFound

goroutine 1 [running]:
main.main()
        /home/core/go/src/gcf_hello_world/checkTable.go:53 +0x991
exit status 2

I am able to consitently reproduce it with the code below:

First make sure the dataset exists
In the below code, change the tableID to a table which does not exist yet.

It will first insert without creating table, which is expected to fail.

Then it will create new table

Finally, it will insert again but this time also we’ll end up with 404 table not found error.

Here is my project ID: pureapp-199410 (API request logs might help here)

In checkTable.go I have:

package main

import (
	"context"
	"log"

	"cloud.google.com/go/bigquery"
	"google.golang.org/api/option"
)

const (
	projectID = "pureapp-199410"
	seperator = "."
)

type Item struct {
	Name  string
	Count int
}

func main() {
	ctx := context.Background()
	client, err := bigquery.NewClient(ctx, projectID, option.WithCredentialsFile("creds.json"))
	if err != nil {
		panic(err)
	}
	datasetID := "dream4"
	tableID := "sometable13"
	items := []*Item{
		// Item implements the ValueSaver interface.
		{Name: "n1", Count: 7},
		{Name: "n2", Count: 2},
		{Name: "n3", Count: 1},
	}

	u := client.Dataset(datasetID).Table(tableID).Uploader()
	if err := u.Put(ctx, items); err != nil {
		log.Println(err)
	} else {
		log.Println("Insert successful")
	}

	log.Println("Creating table too")
	table := client.Dataset(datasetID).Table(tableID)
	ctx1 := context.Background()
	if err := table.Create(ctx1, &bigquery.TableMetadata{Schema: schema}); err != nil {
		panic(err)
	}

	u = client.Dataset(datasetID).Table(tableID).Uploader()
	ctx2 := context.Background()
	if err := u.Put(ctx2, items); err != nil {
		panic(err)
	} else {
		log.Println("Insert successful")
	}

}

var schema bigquery.Schema = []*bigquery.FieldSchema{
	{Name: "Name", Type: bigquery.StringFieldType},
	{Name: "Count", Type: bigquery.IntegerFieldType},
}

About this issue

Original URL
State: closed
Created 6 years ago
Reactions: 2
Comments: 20 (4 by maintainers)

Most upvoted comments

@jba Main concern is that nowhere in documentation it mentions anything like “table will eventually be there” and no recommended solutions offered either. Everyone has to figure this out on their own which is wasteful.

I spent a few hours yesterday thinking I was going mad. It would be nice if user experience was better in this regard.

+27

ppanyukov on Jan 14, 2020

When I call .exists() after .createTable() it returns true but the .insert() throws a 404. It seems unintuitive that exists != available. Similarly, awaiting .createTable() would seem to imply availability.

If an existence check passes then 404 with the message “Table X not found” is not the right message to return from insert since it clearly does exist. This kind of error for something I want to exist implies that the action I should take to fix it is to create it but in this case I shouldn’t because it already exists. Instead I should be told to wait. I get “eventually consistent” but I’m going to need the API to be consistent on its definition of existence.

I’m disappointed that the best guidance appears to be gambling on a sufficiently large sleep. I’m probably going to put a retry on the insert with an exponential wait. It would be nice if this kind of logic was built into the SDK or if there was an availability check since existence appears to not provide that assurance.

+14

erik-induro on Dec 7, 2022

There’s effectively nothing we can do here, as this is a classic eventual consistency issue. We document the behavior, but I’ll try to explain it in a bit more detail the “why”.

When you intermingle operations that change table metadata and stream data into a table, you’re likely to observe the effects of this eventually consistent behavior. The streaming system, by nature of its vastly different scale, caches table metadata aggressively, in a combination of shared and individual caches.

Generally, the pattern that causes users the most problems is a stream->create table->stream pattern. The manifestation often looks something like the following:

stream to a table, receive 404 table not found error
create the table, receive success
resume streaming, continue to receive 404 for a period

It’s the first streaming call that triggers the problem here. The call requires the streaming system to load the metadata state, and at that moment the table doesn’t exist. This negative existence is cached by the streaming system for a time even if the table is created immediately afterwards.

Subsequent streaming calls leverage their cached metadata, and thus reject inserts for a time until such time as the cached table metadata expires and gets refreshed. Callers receive inconsistent responses because each streaming backend instance may have a slightly different cache state.

Generally, the best thing to do is to design your interactions so that you don’t change table metadata while interacting with the streaming system. In the previous example, ensuring the table is created before the first streaming call is generally sufficient.

There are other interaction patterns, like deleting and recreating a table with the same name that will trigger similarly observed behaviors. In this case, rather than caching a negative existence, what users will observe is that not all writes appear to arrive to the table. In this case, it’s because some writes may have been send to the old (now deleted) table, and some to the new. Similarly, schema evolution, where the schema of a table is extended, may take some time before all the streaming backends see the updated changes to a given table’s schema.

Hopefully this provides some additional background into the nature of the issue.

shollyman on Apr 11, 2023

Same issue for me, but with a different client (nodejs). I believe it’s not specific to the go platform, it’s a server-side issue. In my case I explicitly create a table and then insert into it. And sometimes the inserting fails. Client/API definitely should have a method to make sure a table has been created and safe to be used.

evil-shrike on Mar 2, 2022

Guys, how are you?

This problems is factible for me too.

My solution is addition a delay time for insert in new table.

my code is in .net5 but the logic is aplicated for your problem.

        if (CheckTableExists(client, datasetId, tableId))
             await client.DeleteTableAsync(datasetId, "your_table_id");`

        await client.GetOrCreateTableAsync(datasetId, tableId: "your_table_id", schema: schema.Build());

        await Task.Delay(150000); // this is the magic line
        await TableInsertRows(client, datasetId, tableId, faturas);
        
        private static bool CheckTableExists(BigQueryClient client, string dataset, string table){
        
             var sql = $"SELECT size_bytes FROM {dataset}.__TABLES__ WHERE table_id='{table}'";
             var result = client.ExecuteQuery(sql, null);
         
             return result.TotalRows > 0;
         }

RamonXavier on Jan 5, 2022

Hello, I’ve got the same issue. Due to some reasons I can’t use TableTemplateSuffix and I need to create table dynamically and stream data to it. I have tried to request TableMetadata and it response successfully but Inserter still returns 404 error right after table.Metadata() success.

There would be really useful some kind of Available() method on Table

@jba

iamolegga on Aug 3, 2020

zero-master on Apr 19, 2018