google-cloud-ruby: Datastore: does not work inside Puma webserver

Would be good to receive independent confirmation, but from what I can tell is that Datastore queries hang if run inside Puma webserver. Same query runs fine from rails console process. Worker processes running other queries/inserts also look to be running ok. All environment is the same.

Web process just hangs when calling dataset.run(query)

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 41 (29 by maintainers)

Commits related to this issue

Most upvoted comments

In summary, this is what we discovered when testing with Ruby web servers configured to fork processes (Puma, Passenger, Unicorn):

  • the grpc library is initialized with require 'gcloud/datastore'
  • the way that the gRPC library within gcloud initializes does not persist properly across forks
  • if the grpc library is initialized prior to the web server forking, the sub-processes don’t have correct initialization. For example, require 'gcloud/datastore executed in a Rails initializer or in any code that is eager loaded (such as within the Rails app/ directory)
  • if you fork the process first and then initialize the grpc library in each worker, everything works correctly

Here is an example of a Rails CloudDatastore initializer that is currently working with Puma:

module CloudDatastore
  if Rails.env.development?
    ENV['DATASTORE_EMULATOR_HOST'] = 'localhost:8180'
    ENV['GCLOUD_PROJECT'] = 'local-datastore'
  elsif Rails.env.test?
    ENV['DATASTORE_EMULATOR_HOST'] = 'localhost:8181'
    ENV['GCLOUD_PROJECT'] = 'test-datastore'
  else
    ENV['GCLOUD_KEYFILE_JSON'] = '{"private_key": "' + ENV['SERVICE_ACCOUNT_PRIVATE_KEY'] + '",
      "client_email": "' + ENV['SERVICE_ACCOUNT_CLIENT_EMAIL'] + '"}'
  end

  def self.dataset
    require 'gcloud/datastore'
    Thread.current[:dataset] ||= Gcloud.datastore(ENV['GCLOUD_PROJECT'])
  end

  def self.reset_dataset
    Thread.list.each do |thread|
      thread[:dataset] = nil if thread.key?(:dataset)
    end
  end
end

@blowmage Unrelated to the grpc initialization problem, we are also trying out your one Gcloud grpc client instance per web server thread concept.

@bmclean Thanks for such a great writeup! This is incredibly useful!

If you (or anyone reading this) are bothered by the fact that you require the library each time you access the dataset object, you can change the implementation a bit to run the require only when you need to create the object:

  def self.dataset
    Thread.current[:dataset] ||= begin
      require "gcloud/datastore"
      Gcloud.datastore ENV["GCLOUD_PROJECT"]
    end
  end