google-cloud-go: BigQuery: 404 table not found even when the table exists
Client
BigQuery Go Client
Describe Your Environment
Linux 4.15.11-1-ARCH SMP PREEMPT x86_64 GNU/Linux go version go1.10.1 linux/amd64
Expected Behavior
I am trying to stream into BigQuery.
It should just insert the data without 404 table not found error since the table already exists.
Actual Behavior
404 table not found
Before running go run
the package was updated with go get -u cloud.google.com/go/bigquery
go run checkTable.go
2018/04/19 14:38:20 googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable14, notFound
2018/04/19 14:38:20 Creating table too
panic: googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable14, notFound
goroutine 1 [running]:
main.main()
/home/core/go/src/gcf_hello_world/checkTable.go:53 +0x991
exit status 2
I am able to consitently reproduce it with the code below:
-
First make sure the dataset exists
-
In the below code, change the
tableID
to a table which does not exist yet.
It will first insert without creating table, which is expected to fail.
Then it will create new table
Finally, it will insert again but this time also we’ll end up with 404 table not found error.
Here is my project ID: pureapp-199410 (API request logs might help here)
In checkTable.go I have:
package main
import (
"context"
"log"
"cloud.google.com/go/bigquery"
"google.golang.org/api/option"
)
const (
projectID = "pureapp-199410"
seperator = "."
)
type Item struct {
Name string
Count int
}
func main() {
ctx := context.Background()
client, err := bigquery.NewClient(ctx, projectID, option.WithCredentialsFile("creds.json"))
if err != nil {
panic(err)
}
datasetID := "dream4"
tableID := "sometable13"
items := []*Item{
// Item implements the ValueSaver interface.
{Name: "n1", Count: 7},
{Name: "n2", Count: 2},
{Name: "n3", Count: 1},
}
u := client.Dataset(datasetID).Table(tableID).Uploader()
if err := u.Put(ctx, items); err != nil {
log.Println(err)
} else {
log.Println("Insert successful")
}
log.Println("Creating table too")
table := client.Dataset(datasetID).Table(tableID)
ctx1 := context.Background()
if err := table.Create(ctx1, &bigquery.TableMetadata{Schema: schema}); err != nil {
panic(err)
}
u = client.Dataset(datasetID).Table(tableID).Uploader()
ctx2 := context.Background()
if err := u.Put(ctx2, items); err != nil {
panic(err)
} else {
log.Println("Insert successful")
}
}
var schema bigquery.Schema = []*bigquery.FieldSchema{
{Name: "Name", Type: bigquery.StringFieldType},
{Name: "Count", Type: bigquery.IntegerFieldType},
}
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 2
- Comments: 20 (4 by maintainers)
@jba Main concern is that nowhere in documentation it mentions anything like “table will eventually be there” and no recommended solutions offered either. Everyone has to figure this out on their own which is wasteful.
I spent a few hours yesterday thinking I was going mad. It would be nice if user experience was better in this regard.
When I call
.exists()
after.createTable()
it returns true but the.insert()
throws a 404. It seems unintuitive that exists != available. Similarly, awaiting.createTable()
would seem to imply availability.If an existence check passes then 404 with the message “Table X not found” is not the right message to return from insert since it clearly does exist. This kind of error for something I want to exist implies that the action I should take to fix it is to create it but in this case I shouldn’t because it already exists. Instead I should be told to wait. I get “eventually consistent” but I’m going to need the API to be consistent on its definition of existence.
I’m disappointed that the best guidance appears to be gambling on a sufficiently large sleep. I’m probably going to put a retry on the insert with an exponential wait. It would be nice if this kind of logic was built into the SDK or if there was an availability check since existence appears to not provide that assurance.
There’s effectively nothing we can do here, as this is a classic eventual consistency issue. We document the behavior, but I’ll try to explain it in a bit more detail the “why”.
When you intermingle operations that change table metadata and stream data into a table, you’re likely to observe the effects of this eventually consistent behavior. The streaming system, by nature of its vastly different scale, caches table metadata aggressively, in a combination of shared and individual caches.
Generally, the pattern that causes users the most problems is a stream->create table->stream pattern. The manifestation often looks something like the following:
It’s the first streaming call that triggers the problem here. The call requires the streaming system to load the metadata state, and at that moment the table doesn’t exist. This negative existence is cached by the streaming system for a time even if the table is created immediately afterwards.
Subsequent streaming calls leverage their cached metadata, and thus reject inserts for a time until such time as the cached table metadata expires and gets refreshed. Callers receive inconsistent responses because each streaming backend instance may have a slightly different cache state.
Generally, the best thing to do is to design your interactions so that you don’t change table metadata while interacting with the streaming system. In the previous example, ensuring the table is created before the first streaming call is generally sufficient.
There are other interaction patterns, like deleting and recreating a table with the same name that will trigger similarly observed behaviors. In this case, rather than caching a negative existence, what users will observe is that not all writes appear to arrive to the table. In this case, it’s because some writes may have been send to the old (now deleted) table, and some to the new. Similarly, schema evolution, where the schema of a table is extended, may take some time before all the streaming backends see the updated changes to a given table’s schema.
Hopefully this provides some additional background into the nature of the issue.
Same issue for me, but with a different client (nodejs). I believe it’s not specific to the go platform, it’s a server-side issue. In my case I explicitly create a table and then insert into it. And sometimes the inserting fails. Client/API definitely should have a method to make sure a table has been created and safe to be used.
Guys, how are you?
This problems is factible for me too.
My solution is addition a delay time for insert in new table.
my code is in .net5 but the logic is aplicated for your problem.
Hello, I’ve got the same issue. Due to some reasons I can’t use
TableTemplateSuffix
and I need to create table dynamically and stream data to it. I have tried to requestTableMetadata
and it response successfully butInserter
still returns 404 error right aftertable.Metadata()
success.There would be really useful some kind of
Available()
method onTable
@jba