sqlx: 0.6.0: Pool::close does not completely close when awaited
I have a somewhat-hackish setup to create a test database for every integration tests, on struct creation it creates the database with a connection pool and on drop it closes the pool before deleting the database. (Side Note: if someone has a better way of doing that I am definitely interested)
Since upgrading to v0.6.0 the database drop fails because there is still connections open:
Could not drop test db because of error returned from database: database "slicendice-integration-test-b61901b5-b86a-4e9b-820b-a8c81bf3cc8" is being accessed by other users
It only seems to happen on MacOS (I have not tested locally on Linux but here is a CI run demonstrating the problem).
I have not been able to create a small example that demonstrate the problem, following is my attempt that does not exhibit it.
Simple example
use std::env;
use std::time::Duration;
use anyhow::{Context, Result};
use async_std::task;
use sqlx::Connection;
use sqlx::{PgConnection, PgPool};
use tide::http::Url;
use uuid::Uuid;
async fn use_db(pool: PgPool) -> Result<()> {
sqlx::query(
"CREATE TABLE things (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL
)",
)
.execute(&pool)
.await?;
sqlx::query(
"INSERT INTO things ( name )
VALUES ( 'A name' )",
)
.execute(&pool)
.await?;
Ok(())
}
#[async_std::main]
async fn main() -> Result<()> {
let db_url_string =
env::var("DATABASE_URL").context("DATABASE_URL env variable needs to be set")?;
// Create DB
let mut db_url = Url::parse(&db_url_string)?;
let id: Uuid = Uuid::new_v4();
let mut db_name = String::from("slicendice-integration-test-");
db_name.push_str(&id.to_string());
db_url.set_path("/");
let mut conn = PgConnection::connect(&db_url.to_string()).await?;
sqlx::query(&format!(r#"CREATE DATABASE "{}""#, &db_name))
.execute(&mut conn)
.await?;
// Use DB
let mut db_path = String::from("/");
db_path.push_str(&db_name);
db_url.set_path(&db_path);
let pool = PgPool::connect(&db_url.to_string()).await?;
use_db(pool.clone()).await?;
// Drop DB
pool.close().await;
dbg!(pool.size());
dbg!(pool.num_idle());
let mut db_url = db_url.clone();
db_url.set_path("/");
let mut conn = PgConnection::connect(&db_url.to_string()).await?;
// Using format here because bind doesn't work for DROP DATABASE
sqlx::query(&format!(r#"DROP DATABASE "{}""#, &db_name))
.execute(&mut conn)
.await?;
Ok(())
}
My guess would be that because I am passing the pool to tide’s context it’s being held longer that in my simple example as when I create a test that only creates the scaffold and makes a call to the database without creating a tide application the problem does not happen.
I have added a call to Pool::size after an awaited Pool::close, in v0.5 it is consistently equal to 0 while in v0.6 it is always equal to 1.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 16
Commits related to this issue
- Fixes integration test scaffold on sqlx v0.6 There is a regression in sqlx v0.6 where `Pool::close` does not actually close all connections, after some investigation it was found that the pool that w... — committed to TimDeve/slice-n-dice by TimDeve 2 years ago
- fix(pool): close when last handle is dropped, extra check in `try_acquire` closes #1928 closes #2375 — committed to launchbadge/sqlx by abonander a year ago
- fix(pool): close when last handle is dropped, extra check in `try_acquire` closes #1928 closes #2375 — committed to launchbadge/sqlx by abonander a year ago
- fix(pool): close when last handle is dropped, extra check in `try_acquire` closes #1928 closes #2375 — committed to Aandreba/sqlx by abonander a year ago
No, it does not. It references the Tokio runtime set for the current thread the code is executing on (
sqlx_rt::Handleis a reexport oftokimo::runtime::Handle), only if theruntime-tokio-*feature is enabled. That staticRUNTIMEis for use insqlx-macrosonly, which is implied by the comment immediately above it. It doesn’t really need to exist insqlx-rtfor that reason which is why I’ve moved it in my current refactoring effort (#2039).They are not, and Tokio actually does not like it when you move
TcpStreams from one runtime to another. It does continue to work but only until theTcpStream’s original runtime exits, at which point it will panic if you try to use it.async-stdonly ever has one runtime per process so that’s different.That is a deliberate feature, yes. If you have a test failure, it’s useful to go back and look at the contents of the database to help figure out what went wrong. They won’t pile up forever, though. Old test databases are cleaned up every time on startup of any binary using
#[sqlx::test](as stated in the next paragraph after the one that describes this feature), so at most you’ll have as many as the number of failed tests in the latest run.I had intended to add commands to
sqlx-clito manage test databases but I wanted to focus on getting the core#[sqlx::test]functionality working and merged.All that aside, I’ve realized that there’s likely a TOCTOU bug in
Pool::close()where it thinks all connections have been released and then returns, but there may still be some in-flight.Specifically, here: https://github.com/launchbadge/sqlx/blob/main/sqlx-core/src/pool/inner.rs#L257
If
.close()is called on another thread between.acquire_permit()and.try_increment_size()when the size is zero,.close()will exit because it sees the size is zero and (incorrectly) assumes that there will never be any more connections on the pool.It used to acquire
self.options.max_connectionspermits right away, which would preclude any concurrent connection attempts, but that had to be changed with the#[sqlx::test]functionality as that allows pools to have a parent/child relationship where the child steals semaphore permits from the parent as needed, in order to enforce a global connection limit for all running tests. Simply put, there may not bemax_connectionspermits on the semaphore anymore, and stealing them from the parent pool just so the child pool can close would be wasteful.I think the fix here would be to change
.try_increment_size()to immediately returnErr(permit)if the pool is closed, and then the logic inacquire()will loop back to the top and then returnErr(Error::PoolClosed)on the next call to.acquire_permit().That should eliminate the possibility of
sizebeing zero and thus.close()returning while connection attempts can still be pending.