makara: makara connection pool management is not thread-safe

The connection pool management is not thread-safe. So, when running makara in thread-intensive code, such as sidekiq, subtle errors creep in.

The fix is not entirely straightforward: AR 4 and 5 has reorganized the connection pool management to be thread-safe, with lots of mutex usage. Makara has none of that, and it’s connection pool management is not thread-safe.

In fact, makara uses an array of connections for each pool, which is traversed each time a connection decision is being made. Connections are added up to the maximum connection, but there are no protections against simultaneous access and changes across threads.

In addition, in the master and replica (slave) connection pools, there are connections with different states in each array: blacklisted or not.

We could protect the connection pool array with a semaphore, but traversing an array behind a semaphore is a Bad Idea because it blocks all other threads from traversing that same array. If the semaphore supported read-locking in addition to write-locking then multi-thread traversals would work in parallel, except for adding or removing connections.

However, IMHO, it would be better to have multiple arrays of connection pools, with each kind of connection being queued into a separate pool (array) of connections. So, blacklisted connections would be maintained in a blacklisted connection pool, and the remaining set of available connections would be in the available connection pool. Then, each pool could be managed as a thread-safe queue object.

Makara should be updated to make best use of the latest AR connection pool code, becoming thread-safe in the process.

About this issue

  • Original URL
  • State: open
  • Created 7 years ago
  • Comments: 32 (7 by maintainers)

Most upvoted comments

@jeffdoering – Have you filed a detailed issue yet?

We’re seeing issues with our ConnectionPool becoming corrupted during database failovers in sidekiq only with makara and we are unable to recover without restarting the servers.

ActiveRecord::ConnectionTimeoutError: could not obtain a connection from the pool within 5.000 seconds (waited 5.001 seconds); all pooled connections were in use


I saw you had a patch into rails core… Does it resolve the issue? https://github.com/rails/rails/pull/36473

Has anyone had any luck in reproducing the issues seen? We are also running with a similar configuration (puma / sidekiq). It would be nice to use this, but this issue concerns us. I looked at the alternatives and they do not seem to have a feature I like (failing over to master on high replica lag). If anyone has reproduced it we’d be willing to help resolve the issue.

@rajagopals recently went through the same thing, needed to migrate off of pgpool/pgbouncer and the thread-safety issue ultimately prevented us from adopting Makara.

We wound up using active_record_slave on top of an Aurora Postgres cluster. No issues yet in a multi-threaded production environment, handling ~40k queries per minute. YMMV

Check out the fresh_connection gem, or the pg-pool-II proxy.


From: Rajagopal notifications@github.com Sent: Wednesday, August 1, 2018 5:36 PM To: taskrabbit/makara Cc: Alan Stebbens; Mention Subject: Re: [taskrabbit/makara] makara connection pool management is not thread-safe (#151)

@akshttps://github.com/aks @jwg2shttps://github.com/jwg2s @bleonardhttps://github.com/bleonard We are evaluating using makara for read-write splitting and came across this thread. Do you have updates/suggestions/mitigation options for this issue?

I don’t see other well-maintained gems for real-write splitting in production, anyone has other suggestions that worked well in production environment?

FYI We use AWS Aurora MySQL as our backing store and our application is hosted on Heroku.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/taskrabbit/makara/issues/151#issuecomment-409768184, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAAPlqgGkzAjoR5m8KSvKljms4mfxvMeks5uMkmAgaJpZM4MdAFx.