sqlx: Extremely slow repetitive regexing from the new QueryLogger
All coming from here: https://github.com/launchbadge/sqlx/commit/7bc3133f70b63cb177927b03c3c9b745d6aa5a29#diff-eb0cd75f4616762e6091c0588db07a75
I’m seeing an execution that runs many times (graphql resolver that resolves on thousands of its children) throttled at 100% . It spends all that time regexing through the SQL I’ve passed it in QueryLogger. This used to be quite fast as it would simply do a cache check on a prepared statement against the connection, and if it did, execute (or prepare a new one) and never needed to read into the query itself.
Here’s a simplified callstack from lldb, where I’ve tried to reduce the noise and remove any internal-eyes-only sql:
thread #3, name = 'actix-rt:worker:0'
frame #0: 0x000000010ad819fa clausehound`regex::backtrack::usize_to_u32::hdf907051fdf3a146(n=4) at backtrack.rs:287:5
frame #1: 0x000000010ad818da clausehound`regex::backtrack::Bounded$LT$I$GT$::has_visited::hf8af1293bf08fe70(self=0x000070000a51ecc0, ip=1225, at=InputAt @ 0x000070000a51e838) at backtrack.rs:273:18
frame #2: 0x000000010ad806a5 clausehound`regex::backtrack::Bounded$LT$I$GT$::step::hbc37ff202a0f4eb5(self=0x000070000a51ecc0, ip=1225, at=InputAt @ 0x000070000a51eaa0) at backtrack.rs:205:16
frame #3: 0x000000010ad80550 clausehound`regex::backtrack::Bounded$LT$I$GT$::backtrack::hee64c3ce72ad992c(self=0x000070000a51ecc0, start=<unavailable>) at backtrack.rs:178:24
frame #4: 0x000000010ad7fb3a clausehound`regex::backtrack::Bounded$LT$I$GT$::exec_::h230596d2910ee547(self=0x000070000a51ecc0, at=InputAt @ 0x000070000a51ed00, end=241) at backtrack.rs:145:55
frame #5: 0x000000010ad7f05a clausehound`regex::backtrack::Bounded$LT$I$GT$::exec::h473fc42d6ce15005(prog=0x00007fb009800420, cache=0x00007fb007f13f70, matches=&mut [bool] @ 0x000070000a51ed30, slots=(data_ptr = 0x00007fb007c10880, length = 6), input=CharInput @ 0x000070000a51ec88, start=0, end=241) at backtrack.rs:103:9
frame #6: 0x000000010adaf3d8 clausehound`regex::exec::ExecNoSync::exec_backtrack::h97e8f52e94e02b47(self=0x000070000a529430, matches=&mut [bool] @ 0x000070000a51ee68, slots=(data_ptr = 0x00007fb007c10880, length = 6), text=(data_ptr = "XXXSQLXXX", length = 241), start=0, end=241) at exec.rs:1119:13
frame #7: 0x000000010adaeea0 clausehound`regex::exec::ExecNoSync::exec_nfa::h0cfe7711f7d7040a(self=0x000070000a529430, ty=Backtrack, matches=&mut [bool] @ 0x000070000a51ef68, slots=(data_ptr = 0x00007fb007c10880, length = 6), quit_after_match=false, quit_after_match_with_pos=false, text=(data_ptr = "XXXSQLXXX", length = 241), start=0, end=241) at exec.rs:1060:13
frame #8: 0x000000010adaeb3a clausehound`regex::exec::ExecNoSync::captures_nfa_type::ha869baee80db827c(self=0x000070000a529430, ty=Auto, slots=(data_ptr = 0x00007fb007c10880, length = 6), text=(data_ptr = "XXXSQLXXX, length = 241), start=0, end=241) at exec.rs:1009:12
## ..... a bunch more Regex::captures
frame #15: 0x000000010a5eaf83 clausehound`sqlformat::tokenizer::get_plain_reserved_token::he762e95704837ba6(input=(data_ptr = "XXXSQLXXX, length = 241)) at tokenizer.rs:212:5
frame #16: 0x000000010a5eae02 clausehound`sqlformat::tokenizer::get_reserved_word_token::_$u7b$$u7b$closure$u7d$$u7d$::h37054d6b1e5ddfaf at tokenizer.rs:188:21
frame #17: 0x000000010a607714 clausehound`core::option::Option$LT$T$GT$::or_else::h134d84f62f07b282(self=Option<sqlformat::tokenizer::Token> @ 0x000070000a529838, f=closure-2 @ 0x000070000a5297d8) at option.rs:766:21
frame #18: 0x000000010a5ead77 clausehound`sqlformat::tokenizer::get_reserved_word_token::hf9734b5ed74bca49(input=(data_ptr = "XXXXXSQL", length = 241), previous_token=Option<&sqlformat::tokenizer::Token> @ 0x000070000a529830) at tokenizer.rs:185:5
frame #19: 0x000000010a5e9c2c clausehound`sqlformat::tokenizer::get_next_token::_$u7b$$u7b$closure$u7d$$u7d$::hd7a15da3554e8455 at tokenizer.rs:79:21
frame #20: 0x000000010a607fe0 clausehound`core::option::Option$LT$T$GT$::or_else::hd328e8464ce36f66(self=Option<sqlformat::tokenizer::Token> @ 0x000070000a529a88, f=closure-6 @ 0x000070000a529980) at option.rs:766:21
frame #21: 0x000000010a5e9a3b clausehound`sqlformat::tokenizer::get_next_token::heea5162f560d3da9(input=(data_ptr = "XXXXXSQL", length = 241), previous_token=Option<&sqlformat::tokenizer::Token> @ 0x000070000a5299d8) at tokenizer.rs:72:5
frame #22: 0x000000010a5e9558 clausehound`sqlformat::tokenizer::tokenize::haf06a858d053e60b(input=(data_ptr = "XXXXXSQL", length = 241)) at tokenizer.rs:12:22
frame #23: 0x000000010a609d3e clausehound`sqlformat::format::h4a39c243e774c1e6(query=(data_ptr = "XXXXXSQL, length = 311), params=0x000000010b017050, options=(indent = sqlformat::Indent @ 0x000070000a529e80, uppercase = false, lines_between_queries = '\x01')) at lib.rs:15:18
frame #24: 0x000000010a5672eb clausehound`sqlx_core::logger::QueryLogger::finish::hd9f9434d6a799964(self=0x00007fb009093fe0) at logger.rs:41:21
frame #25: 0x000000010a567761 clausehound`_$LT$sqlx_core..logger..QueryLogger$u20$as$u20$core..ops..drop..Drop$GT$::drop::he09628e261f4cd86(self=0x00007fb009093fe0) at logger.rs:70:9
frame #26: 0x000000010a5cb2c1 clausehound`core::ptr::drop_in_place::h0ec659c352390567((null)=0x00007fb009093fe0) at mod.rs:177:1
frame #27: 0x000000010a5cd699 clausehound`core::ptr::drop_in_place::h46c9c821341507f9 at async_stream.rs:71:9
frame #28: 0x000000010a5caff1 clausehound`core::ptr::drop_in_place::h053074e64eb2d435((null)=0x00007fb009093f68) at mod.rs:177:1
frame #29: 0x000000010a5cf5a9 clausehound`core::ptr::drop_in_place::h8b42d23ca82add9a at async_stream.rs:28:33
frame #30: 0x000000010a5d24a1 clausehound`core::ptr::drop_in_place::hdab8fe07eb2a2c7e((null)=0x00007fb009093c00) at mod.rs:177:1
frame #31: 0x000000010a5cec01 clausehound`core::ptr::drop_in_place::h6e6ced6195372bbc((null)=0x00007fb009093c00) at mod.rs:177:1
frame #32: 0x000000010a5ce9b1 clausehound`core::ptr::drop_in_place::h6cb48ac7c1601a22((null)=0x00007fb009093c00) at mod.rs:177:1
frame #33: 0x000000010a5ce39f clausehound`core::ptr::drop_in_place::h5f6a3dfe4f33b527((null)=0x00007fb007d26858) at mod.rs:177:1
frame #34: 0x000000010a5d2b91 clausehound`core::ptr::drop_in_place::hecf2308d5edf9470((null)=0x00007fb007d26858) at mod.rs:177:1
frame #35: 0x000000010a5d334e clausehound`core::ptr::drop_in_place::hf642a7d0cc209dc4((null)=0x00007fb007d26850) at mod.rs:177:1
frame #36: 0x000000010a5cc30b clausehound`core::ptr::drop_in_place::h29b1a0c6f7831645((null)=0x00007fb00908b9e8) at mod.rs:177:1
frame #37: 0x000000010a5ce9d1 clausehound`core::ptr::drop_in_place::h6cbe51a0d5d8a1aa((null)=0x00007fb00908b9e8) at mod.rs:177:1
frame #38: 0x000000010a11487b clausehound`core::ptr::drop_in_place::hc85beac201e8602b at async_stream.rs:71:9
frame #39: 0x000000010a1041e1 clausehound`core::ptr::drop_in_place::h88e0e36d05fd5962((null)=0x00007fb00908b970) at mod.rs:177:1
frame #40: 0x000000010a121399 clausehound`core::ptr::drop_in_place::hfbcede085ec486cc at async_stream.rs:28:44
frame #41: 0x000000010a106f01 clausehound`core::ptr::drop_in_place::h94899048937e44d3((null)=0x00007fb00908b600) at mod.rs:177:1
frame #42: 0x000000010a121dd1 clausehound`core::ptr::drop_in_place::hfd19355a22e27521((null)=0x00007fb00908b600) at mod.rs:177:1
frame #43: 0x000000010a0fa081 clausehound`core::ptr::drop_in_place::h662773a8b5a00cd7((null)=0x00007fb00908b600) at mod.rs:177:1
frame #44: 0x000000010a5ce39f clausehound`core::ptr::drop_in_place::h5f6a3dfe4f33b527((null)=0x00007fb007d148f8) at mod.rs:177:1
frame #45: 0x000000010a5d2b91 clausehound`core::ptr::drop_in_place::hecf2308d5edf9470((null)=0x00007fb007d148f8) at mod.rs:177:1
frame #46: 0x000000010a5d334e clausehound`core::ptr::drop_in_place::hf642a7d0cc209dc4((null)=0x00007fb007d148f0) at mod.rs:177:1
frame #47: 0x000000010a5d14bf clausehound`core::ptr::drop_in_place::hc85774b31faf7854((null)=0x00007fb007d1d1e0) at mod.rs:177:1
frame #48: 0x000000010a5cf0c1 clausehound`core::ptr::drop_in_place::h7a4a3bfda5533806((null)=0x00007fb007d1d1e0) at mod.rs:177:1
frame #49: 0x000000010a1d65ed clausehound`sqlx_core::mysql::connection::executor::_$LT$impl$u20$sqlx_core..executor..Executor$u20$for$u20$$RF$mut$u20$sqlx_core..mysql..connection..MySqlConnection$GT$::fetch_optional::_$u7b$$u7b$closure$u7d$$u7d$::h803ff86c56fe07fe((null)=ResumeTy @ 0x000070000a52e7b8) at executor.rs:255:9
## a bunch of futures-polling frames...
### finally, my function
frame #54: 0x0000000109e7a150 clausehound`_$LT$clausehound..models..framework..Framework$u20$as$u20$clausehound..models..base..Fetch$GT$::fetch::__fetch::_$u7b$$u7b$closure$u7d$$u7d$::hd4129b2b7f99f3a9((null)=ResumeTy @ 0x000070000a52fe60) at framework.rs:15:24
Being able to turn off that QueryLogger and not pay the hard cost of its ::new would be great, but at very least caching that SQL string into a hashmap would speed things up a ton so it wouldn’t have to regex it literally millions of times.
Hitting this using the mysql driver, on 0.4.0-beta.1
About this issue
- Original URL
- State: open
- Created 4 years ago
- Reactions: 2
- Comments: 20 (16 by maintainers)
Commits related to this issue
- Avoid using a regex for the reserved word lookups Improves average case performance by 50%. Related to https://github.com/launchbadge/sqlx/issues/633 — committed to shssoichiro/sqlformat-rs by shssoichiro 4 years ago
- Avoid using a regex for the reserved word lookups Improves average case performance by 50%. Related to launchbadge/sqlx#633 — committed to shssoichiro/sqlformat-rs by shssoichiro 4 years ago
- Avoid using a regex for the reserved word lookups Improves average case performance by 50%. Related to launchbadge/sqlx#633 — committed to shssoichiro/sqlformat-rs by shssoichiro 4 years ago
- Convert the tokenizer to use nom (#4) Provides significant performance improvements. See https://github.com/launchbadge/sqlx/issues/633. ``` simple query time: [15.707 us 15.721 us 1... — committed to shssoichiro/sqlformat-rs by shssoichiro 4 years ago
- Avoid regex for trim_spaces_end Improves performance on certain pathological queries by up to 60%. See launchbadge/sqlx#633. — committed to shssoichiro/sqlformat-rs by shssoichiro 4 years ago
My bad, I’ve used
debugbuild and did not mention it.Rechecked, on my machine,
releasewith query logging produces 16-20ms per insert for 1000 items, but starts to grow when I insert more:3000 items with 1 column are inserted in
5-8msfor me when I disabled the sql logging.If I increase the number of columns to 3 and rerun the
releasebuilt tests, it gets way worse:3000 items with 3 columns are inserted in
~10ms(with the first query being slow, ~30ms) for me when I disabled the sql logging.===
IMO there’s something fishy here still, you just have to be curious a bit and fiddle with the query parameter numbers to receive peculiar performance results.
With the release of sqlformat 0.1.2, the runtime for this pathological query has been reduced by 90%, from 100ms to 10ms. Runtime for all queries has improved across the board by about 25% as well.
I think in order to get any more significant gains, I’ll have to rewrite sqlformat’s tokenizer to use nom instead of regex. So I’ll publish a patch version for now with the gains that are in place.