kakoune: Add a kakquote function to the prelude of shell blocks
Since Kakoune’s quoting system was reworked, it’s pretty easy to reliably quote Kakoune strings by just doubling apostrophes. In shell, it looks something like this:
kakquote() {
printf "'"
printf "$*" | sed "s/'/''/g"
printf "'"
}
However, while working on #3336, I’ve copied and pasted this fragment into three or four shell blocks already; I expect it (or something like it) is already present in a bunch of other scripts too.
We already inject environment variables into shell blocks; how about we prepend this helpful and near-universally-useful function as well?
(it should be possible to implement this in pure POSIX shell without sed
, but it would be too complex. If avoiding a fork turns out to be a performance improvement, it’d be nice to make the change once in Kakoune’s shell prelude, and not have to copy/paste the change into thousands of existing Kakoune scripts)
About this issue
- Original URL
- State: open
- Created 4 years ago
- Comments: 18 (15 by maintainers)
I like the modularity of having
kakquote
do exactly one thing, so I can use a printf format string to build up the command, and be sure I’ve quoted each argument properly:…but now that I think about it, I could still do that with a quoter that quoted each argument individually. In fact, if I quoted each argument individually, I could do this:
…which is a lot nicer, although it quotes more than strictly necessary.
My one fear with processing arguments individually is that it might be a bit more expensive than quoting the whole string at once. But rather than guess wildly, I decided to write a little benchmark:
single_sed_quoter
is basically my originalkakquote
implementation above.single_flat_sed_quoter
is the same thing, but squished into a single line, as suggested by lenormf.multi_sed_quoter
is basically alexherbo2’s quote-arguments-individually implementation.single_builtin_quoter
implements quoting using only shell-builtins instead of launching sed as an external process. I know that launching external processes was frowned on back in the day, but I don’t know how much of an impact it has on a modern system, so I figured I’d try it.multi_builtin_quoter
is the builtin quoting algorithm running once-per-argument (like alexherbo’s suggestion).Note that
test_quoter()
only tests the single-argument case, so it should work properly for both$@
and$*
quoters.benchmark_quoter()
provides multiple arguments, so quoters that do per-argument work should be slower.Here’s the results for whatever version of
dash
Debian Testing is currently using (this is an edited screenshot; each result is the best of 3 runs):busybox sh
is similar (again, each result is the best of 3):bash
is quite a bit slower (again, each result in the best of 3):My original
single_sed_quoter
function is pretty similar everywhere, interestingly enough.single_flat_sed_quoter
is a bit slower, which surprises me, but I guess it’s doing an additional string concatenation in RAM instead of just writing bytes to stdout?multi_sed_quoter
invokessed
three times instead of one, and runs at about a third the speed of the single quoter. That’s reasonable, but it’s probably not the implementation we want in the prelude.single_builtin_quoter
is an ugly implementation, but wow is it fast. Turns out avoiding process forks is still pretty important! It ranges from 11.0 times as fast assingle_sed_quoter
, up to 36.4 times as fast!multi_builtin_quoter
is about two thirds of the speed ofsingle_builtin_quoter
, although I bet that depends heavily on the number and size of the arguments. Still, it doesn’t slow down as much asmulti_sed_quoter
slows down, and it’s still 7.4 to 24.9 times as fast as the fastestsed
-based quoter!Overall, I think
multi_builtin_quoter
is the right choice for the prelude - it’s pretty fast, and it allows the very pleasantkakquote echo -markup {Error}"$msg"
style. It’s too ugly for me to copy/paste into every%sh{}
block, so it would really benefit from being in the prelude.Apparently once you have a benchmarking harness, you can’t stop coming up with new variants to benchmark. I have some new variations based on
single_builtin_quoter
, with the following changes:while [ -n "$text" ]
I havewhile true
and abreak
, avoiding a testprintf "%s" blah; printf "''"
I haveprintf "%s''" blah
, saving a printf"'"
I match\'
, which seems to be very slightly fasterThe new versions look like this:
(the “ntmp” in the name is short for “no test, merged printfs”)
Benchmarking them (as before):
Converting to ms for comparison with lenormf’s table, that’s:
Basically, the new multi-quoter is as fast as the old single-quoter, and the new single-quoter is even faster.
My current position:
multi_builtin_quoter_all_backslashes_ntmp
single_all_sed_quoter
is easy to copy/paste and the performance is decent.I’d argue that ~5ms is much too expensive for something as trivial as escaping a string. Consider that it’s not uncommon to generate option lists and to have to escape each element.
OpenBSD’s sh is the only platform shell that I’ve seen which doesn’t have printf as builtin, and it would be trivial (although not obvious) for an OpenBSD user to set KAKOUNE_POSIX_SHELL to
dash
or any other shell. In addition, forking on windows (and on mac to a lesser degree) is much more expensive than on linux.The devil with
O(n)
is in the hidden constant, as they say. It might be very different for longer strings (and there’s also a 2x memory cost). In that case, however, it’s probably a lot faster to callsed
than to iterate in shell.But actually it gets even more interesting. By pre-collapsing the strings a-la occivink, and avoiding variables completely (
sh
has$1
,$2
etc which can be assigned viaset --
):The results are some 50% faster compared to
_ntmp
:I have some new measurements. I was pointed to @occivink’s quoting function in his kakoune-snippets package, which double-escapes apostrophes for some reason, but I edited it to match the other implementations:
I also figured out how to escape a value entirely within
sed
… which is not as nice as doing it entirely outside of sed, but it can fit on a single line:Results for these new implementations, on the same laptop as before, best of 3 runs:
single_all_sed_quoter
performs identically tosingle_sed_quoter
, but fits into 80 chars if you use a shorter function name, which makes it nice for non-prelude use as lenormf points out.multi_all_sed_quoter
performs identically tomulti_sed_quoter
.occivink_builtin_quoter
performs just a little bit slower thansingle_builtin_quoter
, which I found interesting. When you use builtins to slice a string up to the next apostrophe and there is no apostrophe, you get back the original string rather than an empty string, so you have to treat the last chunk of text specially. occivink tested to see if the slice matched the unsliced value, while I used acase
to search for the string. I honestly wasn’t sure which would be faster.