better-sqlite3: Result of query with comment is inconsistent and random

There seem to be an inconsistent bug in the comments parsing method.

Having the following raw SQL query (making it a one liner doesn’t change anything either):

const query = `SELECT *
						   FROM user_satellites_satellite
									FULL JOIN satellite ON user_satellites_satellite.satelliteId = satellite.id
						   WHERE user_satellites_satellite.userId = '${user.id}'
							 AND (satellite.name LIKE '%${searchQuery}%' OR satellite.description LIKE '%${searchQuery}%');`;

with searchQuery being (random can be anything):

random') UNION SELECT CAST(1 AS VARCHAR), CAST(1 AS VARCHAR), CAST(1 AS VARCHAR), CAST(1 AS VARCHAR), CAST(1 AS VARCHAR), CAST(1 AS VARCHAR), tbl_name, CAST(1 AS VARCHAR), CAST(1 AS VARCHAR), CAST(1 AS VARCHAR), CAST(1 AS VARCHAR), CAST(1 AS VARCHAR), CAST(1 AS VARCHAR), CAST(1 AS VARCHAR) FROM sqlite_master; --

And executing it with query()

await this.dataSource.query(query);

Will either return the correct and expected data or the error

The supplied SQL string contains more than one statement

Logging with NestJS shows the following query is getting executed (query logged being always the same):

Query getting executed

The exact same query being executed multiple times over and over, e.g. by spamming, will make the query return correct data after some time, which makes the bug inconsistent:

https://user-images.githubusercontent.com/43011723/228253781-c317b005-9dec-4919-808c-f99c1f667b1e.mp4


Switching the SQLite driver to the sqlite - being installed with npm install sqlite3 - driver makes the query always work and return the expected result, hence is not related to TypeORM itself.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 18 (11 by maintainers)

Most upvoted comments

Hello, fellow developers! I happened upon this most fascinating of issue, which intrigued me so much I elected to abandon an hour of perfectly good sleep in favour of debugging the crap out of this.

The first thing I did was to insert a bunch of ugly printf statements directly into better-sqlite3.cpp, which mostly led me nowhere. However, after many twists and turns, I made the following modification, allowing me to see every single character processed as part of the query tail:

                 for (char c; (c = *tail); ++tail) {
+                        printf("%c", c);
                         if (IS_SKIPPED(c)) continue;
                         if (c == '/' && tail[1] == '*') {
                                 tail += 2;
+                                printf("*");
                                 for (char c; (c = *tail); ++tail) {
                                         printf("%c", c);
                                         if (c == '*' && tail[1] == '/') {
                                                 tail += 1;
                                                 break;
                                         }
                                 }
                         } else if (c == '-' && tail[1] == '-') {
                                 tail += 2;
+                                printf("-");
                                 for (char c; (c = *tail); ++tail) {
                                         printf("%c", c);
                                         if (c == '\n') break;
                                 }
                         } else {
                                 sqlite3_finalize(handle);
+                                printf("\n");
                                 return ThrowRangeError("The supplied SQL string contains more than one statement");
                         }
                 }
+                printf("\n");

And in my trusty terminal, I began to see something… intriguing. Something unexpected. A pattern of noise just past the last character emerges:

image

Is this an out-of-bounds read? Surely, v8::String::Utf8Value cannot possibly produce a string without zero terminator. But what if… Ah. It is as clear as day now. The nested loops are at fault! Let us consider a simplified double-loop:

for (char a; (a = *tail); ++tail) {
    for (char b; (b = *tail); ++tail) {
      whatever(c);
    }
}

When the condition (b = *tail) is false, then we know tail points to a zero byte. So far, so good. This is how we detect the end of the string, of course. The inner loop thus exits, and we proceed to the next iteration of the outer loop. The outer loop performs its increment: ++tail… and the world ceases to be beautiful. Now tail points almost certainly to garbage. The once innocent *tail invokes behaviour of an undefined nature. Probably. Certainly the trusty processor reads something, which it usually interprets as a query character.

The solution to this loopy confusion requires more thought than my 1am brain can produce. After some sleep, perhaps a pull request will present itself…

It’s nice to see the chosen one show up in times of need 🙏

Let us consider a simplified double-loop

Your explanation: 🧑‍🍳 🤌 💯

Thanks for digging into this!

FWIW ChatGPT is entirely useless telling me for the third time that I need to escape searchQuery because I’m apparently using two statements which better-sqlite3 does no support. Or it insists there might be race conditions between multiple processes or database connections after repeatedly being told it’s a single-use in-memory database.