mssql-jdbc: Statement property maxRows incorrectly affects count(*) on table valued functions.
Setting statement property maxRows results in faulty row count on table valued functions.
Below is the output from the supplied test program. The two functions numbers_tf and numbers_itf both return 10 rows. The former is table-valued function while the latter is an inlined version instead.
maxRows(0): select count(*) from dbo.numbers_itf() => 10 OK!
maxRows(5): select count(*) from dbo.numbers_itf() => 10 OK!
maxRows(0): select count(*) from dbo.numbers_tf() => 10 OK!
maxRows(5): select count(*) from dbo.numbers_tf() => 5 FAILED!
As can be seen setting maxRows to 5 changes the count to 5 instead of the expected 10. Clearly this is a bug. MaxRows should only affect the number of rows that the result set returns.
import java.math.BigInteger;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.concurrent.ThreadLocalRandom;
public class Main {
public static void main(String[] args) throws SQLException {
String url = "jdbc:sqlserver://localhost:1433";
String user = "sa";
String password = "Test1234!";
String database = randomName("test_");
try (
Connection c = DriverManager.getConnection(url, user, password);
) {
try {
executeUpdate(c, "create database " + database);
executeUpdate(c, "use " + database);
executeUpdate(c, "create function dbo.numbers_itf()\n" +
"returns table\n" +
"as return\n" +
" select * from (values (1), (2), (3), (4), (5), (6), (7), (8), (9), (10)) t(num)\n" +
"\n");
executeUpdate(c, "create function dbo.numbers_tf()\n" +
" returns @result table (\n" +
" num int\n" +
" )\n" +
" as\n" +
" begin\n" +
" insert into @result\n" +
" select * from (values (1), (2), (3), (4), (5), (6), (7), (8), (9), (10)) t(num)\n" +
" return\n" +
" end\n" +
"\n");
executeTest(c, 0, "select count(*) from dbo.numbers_itf()", 10);
executeTest(c, 5, "select count(*) from dbo.numbers_itf()", 10);
executeTest(c, 0, "select count(*) from dbo.numbers_tf()", 10);
executeTest(c, 5, "select count(*) from dbo.numbers_tf()", 10);
} finally {
executeUpdate(c,"use master");
executeUpdate(c, "drop database " + database);
}
}
}
private static int executeUpdate(Connection c, String sql) throws SQLException {
try (Statement s = c.createStatement()) {
return s.executeUpdate(sql);
}
}
private static void executeTest(Connection c, int maxRows, String sql, int expected) throws SQLException {
try (Statement s = c.createStatement()) {
s.setMaxRows(maxRows);
s.execute(sql);
try (ResultSet r = s.getResultSet()) {
r.next();
int actual = r.getInt(1);
System.out.format("maxRows(%d): %s => %d %s%n", s.getMaxRows(), sql, actual, (actual != expected ? "FAILED!" : "OK!"));
}
}
}
private static String randomName(String prefix) {
return prefix + (new BigInteger(40, ThreadLocalRandom.current())).toString(32);
}
}
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 33 (24 by maintainers)
Instead of not reading the remaining packets after reaching
maxRows, we could try to read and discard them. This might result in performance degradation whensetMaxRows()is used, but also make the API JDBC compliant. I am currently looking into this, will try to create a PR as well as a few benchmarks.@ulvii Good point. It’s the way how it should work (and how other jdbc drivers work). The point about the double parsing seems like an implementation detail. Why not parse it once and use the information to both count rows and create result set?
I’m fine with breaking backwards compatibility for this since, as mentioned up thread, it really makes no sense to start. I referenced maintaining it out of a general principle of doing so where possible so wouldn’t argue against it if that’s the consensus.
The primary use case for both setMaxRows(…) and
SET ROWCOUNTis when executing arbitrary user SQL, generally as either a query editor UI interface or some type of report generation. Both are a crutch to deal with users who don’t specify their own TOP/FETCH clauses in the underlying SQL.In their current state they have no other real world use case and arguably can’t be used for this one either as they’d horrible break anyone who could also be doing arbitrary DML. That’s why we don’t use it.
Honestly it was very surprising/shocking when we first tried it out and realized that’s how
setMaxRows(...)/SET ROWCOUNTworksbreaks.-1 from me on this. The driver should not modify the user’s SQL.
At best you’ll have an unsupportable nightmare that works in a small subset of cases. Dealing with CTEs, RETURNING clauses in DML, and stored procedures that return result sets makes this a non-starter. Plus a SQL command can return multiple result sets and you can’t modify the SQL of the innards of a stored procedure.
Having something like this only work for the most basic
SELECT ... FROM ...would make it pointless.Having a server side proc to determine the return type for arbitrary SQL isn’t bad as the parsing logic is already there. Presumably the server do all the logic for determining what it would do to actually execute the query and then return only the metadata. Having the driver do the same, if done correctly, would require all that same parsing logic and may not even be possible (ex: stored proc calls a different stored proc).
The server also has the blunt approach of actually executing the command in a new transaction, seeing what the result would be, and then rolling it back. I’m sure that has its own issues though.
Sure splitting up a large DML into smaller chunks can make sense, but you need a way to track what’s being changed (ex: the isChanged flag in your example) and you’d have to run it in a loop too. Having it apply to a WHERE-less
UDPATE foo SET bar = '...'makes no sense.The setMaxRows() in the JDBC spec only refers to ResultSets and I’m guessing is there in support of the arbitrary SQL use case I described above. It’s meant to be a way to provide a hint to the server that you’re not going to process the entire result regardless of how it’s written so it should try (if possible) to optimize for that.
I think what’s really needed is a server side change to have
SET ROWCOUNT(or it’s replacement) be a hint to the server that you are only going to fetch the first N rows of whatever result sets are brought back. That way the server doesn’t waste CPU/bandwidth on the wire sending the rest of the results (which the driver would need to skip over and discard). Without server side support I don’t see a way of this being workable.