sqlx: Extremely slow repetitive regexing from the new QueryLogger

All coming from here: https://github.com/launchbadge/sqlx/commit/7bc3133f70b63cb177927b03c3c9b745d6aa5a29#diff-eb0cd75f4616762e6091c0588db07a75

I’m seeing an execution that runs many times (graphql resolver that resolves on thousands of its children) throttled at 100% . It spends all that time regexing through the SQL I’ve passed it in QueryLogger. This used to be quite fast as it would simply do a cache check on a prepared statement against the connection, and if it did, execute (or prepare a new one) and never needed to read into the query itself.

Here’s a simplified callstack from lldb, where I’ve tried to reduce the noise and remove any internal-eyes-only sql:

 thread #3, name = 'actix-rt:worker:0'
    frame #0: 0x000000010ad819fa clausehound`regex::backtrack::usize_to_u32::hdf907051fdf3a146(n=4) at backtrack.rs:287:5
    frame #1: 0x000000010ad818da clausehound`regex::backtrack::Bounded$LT$I$GT$::has_visited::hf8af1293bf08fe70(self=0x000070000a51ecc0, ip=1225, at=InputAt @ 0x000070000a51e838) at backtrack.rs:273:18
    frame #2: 0x000000010ad806a5 clausehound`regex::backtrack::Bounded$LT$I$GT$::step::hbc37ff202a0f4eb5(self=0x000070000a51ecc0, ip=1225, at=InputAt @ 0x000070000a51eaa0) at backtrack.rs:205:16
    frame #3: 0x000000010ad80550 clausehound`regex::backtrack::Bounded$LT$I$GT$::backtrack::hee64c3ce72ad992c(self=0x000070000a51ecc0, start=<unavailable>) at backtrack.rs:178:24
    frame #4: 0x000000010ad7fb3a clausehound`regex::backtrack::Bounded$LT$I$GT$::exec_::h230596d2910ee547(self=0x000070000a51ecc0, at=InputAt @ 0x000070000a51ed00, end=241) at backtrack.rs:145:55
    frame #5: 0x000000010ad7f05a clausehound`regex::backtrack::Bounded$LT$I$GT$::exec::h473fc42d6ce15005(prog=0x00007fb009800420, cache=0x00007fb007f13f70, matches=&mut [bool] @ 0x000070000a51ed30, slots=(data_ptr = 0x00007fb007c10880, length = 6), input=CharInput @ 0x000070000a51ec88, start=0, end=241) at backtrack.rs:103:9
    frame #6: 0x000000010adaf3d8 clausehound`regex::exec::ExecNoSync::exec_backtrack::h97e8f52e94e02b47(self=0x000070000a529430, matches=&mut [bool] @ 0x000070000a51ee68, slots=(data_ptr = 0x00007fb007c10880, length = 6), text=(data_ptr = "XXXSQLXXX", length = 241), start=0, end=241) at exec.rs:1119:13
    frame #7: 0x000000010adaeea0 clausehound`regex::exec::ExecNoSync::exec_nfa::h0cfe7711f7d7040a(self=0x000070000a529430, ty=Backtrack, matches=&mut [bool] @ 0x000070000a51ef68, slots=(data_ptr = 0x00007fb007c10880, length = 6), quit_after_match=false, quit_after_match_with_pos=false, text=(data_ptr = "XXXSQLXXX", length = 241), start=0, end=241) at exec.rs:1060:13
    frame #8: 0x000000010adaeb3a clausehound`regex::exec::ExecNoSync::captures_nfa_type::ha869baee80db827c(self=0x000070000a529430, ty=Auto, slots=(data_ptr = 0x00007fb007c10880, length = 6), text=(data_ptr = "XXXSQLXXX, length = 241), start=0, end=241) at exec.rs:1009:12

## ..... a bunch more Regex::captures

    frame #15: 0x000000010a5eaf83 clausehound`sqlformat::tokenizer::get_plain_reserved_token::he762e95704837ba6(input=(data_ptr = "XXXSQLXXX, length = 241)) at tokenizer.rs:212:5
    frame #16: 0x000000010a5eae02 clausehound`sqlformat::tokenizer::get_reserved_word_token::_$u7b$$u7b$closure$u7d$$u7d$::h37054d6b1e5ddfaf at tokenizer.rs:188:21
    frame #17: 0x000000010a607714 clausehound`core::option::Option$LT$T$GT$::or_else::h134d84f62f07b282(self=Option<sqlformat::tokenizer::Token> @ 0x000070000a529838, f=closure-2 @ 0x000070000a5297d8) at option.rs:766:21
    frame #18: 0x000000010a5ead77 clausehound`sqlformat::tokenizer::get_reserved_word_token::hf9734b5ed74bca49(input=(data_ptr = "XXXXXSQL", length = 241), previous_token=Option<&sqlformat::tokenizer::Token> @ 0x000070000a529830) at tokenizer.rs:185:5
    frame #19: 0x000000010a5e9c2c clausehound`sqlformat::tokenizer::get_next_token::_$u7b$$u7b$closure$u7d$$u7d$::hd7a15da3554e8455 at tokenizer.rs:79:21
    frame #20: 0x000000010a607fe0 clausehound`core::option::Option$LT$T$GT$::or_else::hd328e8464ce36f66(self=Option<sqlformat::tokenizer::Token> @ 0x000070000a529a88, f=closure-6 @ 0x000070000a529980) at option.rs:766:21
    frame #21: 0x000000010a5e9a3b clausehound`sqlformat::tokenizer::get_next_token::heea5162f560d3da9(input=(data_ptr = "XXXXXSQL", length = 241), previous_token=Option<&sqlformat::tokenizer::Token> @ 0x000070000a5299d8) at tokenizer.rs:72:5
    frame #22: 0x000000010a5e9558 clausehound`sqlformat::tokenizer::tokenize::haf06a858d053e60b(input=(data_ptr = "XXXXXSQL", length = 241)) at tokenizer.rs:12:22
    frame #23: 0x000000010a609d3e clausehound`sqlformat::format::h4a39c243e774c1e6(query=(data_ptr = "XXXXXSQL, length = 311), params=0x000000010b017050, options=(indent = sqlformat::Indent @ 0x000070000a529e80, uppercase = false, lines_between_queries = '\x01')) at lib.rs:15:18
    frame #24: 0x000000010a5672eb clausehound`sqlx_core::logger::QueryLogger::finish::hd9f9434d6a799964(self=0x00007fb009093fe0) at logger.rs:41:21
    frame #25: 0x000000010a567761 clausehound`_$LT$sqlx_core..logger..QueryLogger$u20$as$u20$core..ops..drop..Drop$GT$::drop::he09628e261f4cd86(self=0x00007fb009093fe0) at logger.rs:70:9
    frame #26: 0x000000010a5cb2c1 clausehound`core::ptr::drop_in_place::h0ec659c352390567((null)=0x00007fb009093fe0) at mod.rs:177:1
    frame #27: 0x000000010a5cd699 clausehound`core::ptr::drop_in_place::h46c9c821341507f9 at async_stream.rs:71:9
    frame #28: 0x000000010a5caff1 clausehound`core::ptr::drop_in_place::h053074e64eb2d435((null)=0x00007fb009093f68) at mod.rs:177:1
    frame #29: 0x000000010a5cf5a9 clausehound`core::ptr::drop_in_place::h8b42d23ca82add9a at async_stream.rs:28:33
    frame #30: 0x000000010a5d24a1 clausehound`core::ptr::drop_in_place::hdab8fe07eb2a2c7e((null)=0x00007fb009093c00) at mod.rs:177:1
    frame #31: 0x000000010a5cec01 clausehound`core::ptr::drop_in_place::h6e6ced6195372bbc((null)=0x00007fb009093c00) at mod.rs:177:1
    frame #32: 0x000000010a5ce9b1 clausehound`core::ptr::drop_in_place::h6cb48ac7c1601a22((null)=0x00007fb009093c00) at mod.rs:177:1
    frame #33: 0x000000010a5ce39f clausehound`core::ptr::drop_in_place::h5f6a3dfe4f33b527((null)=0x00007fb007d26858) at mod.rs:177:1
    frame #34: 0x000000010a5d2b91 clausehound`core::ptr::drop_in_place::hecf2308d5edf9470((null)=0x00007fb007d26858) at mod.rs:177:1
    frame #35: 0x000000010a5d334e clausehound`core::ptr::drop_in_place::hf642a7d0cc209dc4((null)=0x00007fb007d26850) at mod.rs:177:1
    frame #36: 0x000000010a5cc30b clausehound`core::ptr::drop_in_place::h29b1a0c6f7831645((null)=0x00007fb00908b9e8) at mod.rs:177:1
    frame #37: 0x000000010a5ce9d1 clausehound`core::ptr::drop_in_place::h6cbe51a0d5d8a1aa((null)=0x00007fb00908b9e8) at mod.rs:177:1
    frame #38: 0x000000010a11487b clausehound`core::ptr::drop_in_place::hc85beac201e8602b at async_stream.rs:71:9
    frame #39: 0x000000010a1041e1 clausehound`core::ptr::drop_in_place::h88e0e36d05fd5962((null)=0x00007fb00908b970) at mod.rs:177:1
    frame #40: 0x000000010a121399 clausehound`core::ptr::drop_in_place::hfbcede085ec486cc at async_stream.rs:28:44
    frame #41: 0x000000010a106f01 clausehound`core::ptr::drop_in_place::h94899048937e44d3((null)=0x00007fb00908b600) at mod.rs:177:1
    frame #42: 0x000000010a121dd1 clausehound`core::ptr::drop_in_place::hfd19355a22e27521((null)=0x00007fb00908b600) at mod.rs:177:1
    frame #43: 0x000000010a0fa081 clausehound`core::ptr::drop_in_place::h662773a8b5a00cd7((null)=0x00007fb00908b600) at mod.rs:177:1
    frame #44: 0x000000010a5ce39f clausehound`core::ptr::drop_in_place::h5f6a3dfe4f33b527((null)=0x00007fb007d148f8) at mod.rs:177:1
    frame #45: 0x000000010a5d2b91 clausehound`core::ptr::drop_in_place::hecf2308d5edf9470((null)=0x00007fb007d148f8) at mod.rs:177:1
    frame #46: 0x000000010a5d334e clausehound`core::ptr::drop_in_place::hf642a7d0cc209dc4((null)=0x00007fb007d148f0) at mod.rs:177:1
    frame #47: 0x000000010a5d14bf clausehound`core::ptr::drop_in_place::hc85774b31faf7854((null)=0x00007fb007d1d1e0) at mod.rs:177:1
    frame #48: 0x000000010a5cf0c1 clausehound`core::ptr::drop_in_place::h7a4a3bfda5533806((null)=0x00007fb007d1d1e0) at mod.rs:177:1
    frame #49: 0x000000010a1d65ed clausehound`sqlx_core::mysql::connection::executor::_$LT$impl$u20$sqlx_core..executor..Executor$u20$for$u20$$RF$mut$u20$sqlx_core..mysql..connection..MySqlConnection$GT$::fetch_optional::_$u7b$$u7b$closure$u7d$$u7d$::h803ff86c56fe07fe((null)=ResumeTy @ 0x000070000a52e7b8) at executor.rs:255:9

## a bunch of futures-polling frames...

### finally, my function
    frame #54: 0x0000000109e7a150 clausehound`_$LT$clausehound..models..framework..Framework$u20$as$u20$clausehound..models..base..Fetch$GT$::fetch::__fetch::_$u7b$$u7b$closure$u7d$$u7d$::hd4129b2b7f99f3a9((null)=ResumeTy @ 0x000070000a52fe60) at framework.rs:15:24

Being able to turn off that QueryLogger and not pay the hard cost of its ::new would be great, but at very least caching that SQL string into a hashmap would speed things up a ton so it wouldn’t have to regex it literally millions of times.

Hitting this using the mysql driver, on 0.4.0-beta.1

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 2
  • Comments: 20 (16 by maintainers)

Commits related to this issue

Most upvoted comments

My bad, I’ve used debug build and did not mention it.

Rechecked, on my machine, release with query logging produces 16-20ms per insert for 1000 items, but starts to grow when I insert more:

1 column,

1000 items: 16-20ms
2000 items: 48-50ms
3000 items: 96-106ms

3000 items with 1 column are inserted in 5-8ms for me when I disabled the sql logging.

If I increase the number of columns to 3 and rerun the release built tests, it gets way worse:

3 columns,

1000 items: ~70ms
2000 items: ~250ms
3000 items: ~520ms

3000 items with 3 columns are inserted in ~10ms (with the first query being slow, ~30ms) for me when I disabled the sql logging.

===

IMO there’s something fishy here still, you just have to be curious a bit and fiddle with the query parameter numbers to receive peculiar performance results.

With the release of sqlformat 0.1.2, the runtime for this pathological query has been reduced by 90%, from 100ms to 10ms. Runtime for all queries has improved across the board by about 25% as well.

I think in order to get any more significant gains, I’ll have to rewrite sqlformat’s tokenizer to use nom instead of regex. So I’ll publish a patch version for now with the gains that are in place.