mssql-jdbc: Statement property maxRows incorrectly affects count(*) on table valued functions.

Setting statement property maxRows results in faulty row count on table valued functions. Below is the output from the supplied test program. The two functions numbers_tf and numbers_itf both return 10 rows. The former is table-valued function while the latter is an inlined version instead.

maxRows(0): select count(*) from dbo.numbers_itf() => 10  OK!
maxRows(5): select count(*) from dbo.numbers_itf() => 10  OK!
maxRows(0): select count(*) from dbo.numbers_tf() => 10  OK!
maxRows(5): select count(*) from dbo.numbers_tf() => 5  FAILED!

As can be seen setting maxRows to 5 changes the count to 5 instead of the expected 10. Clearly this is a bug. MaxRows should only affect the number of rows that the result set returns.

import java.math.BigInteger;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.concurrent.ThreadLocalRandom;

public class Main {
    public static void main(String[] args) throws SQLException {
        String url = "jdbc:sqlserver://localhost:1433";
        String user = "sa";
        String password = "Test1234!";
        String database = randomName("test_");
        try (
                Connection c = DriverManager.getConnection(url, user, password);
        ) {
            try {
                executeUpdate(c, "create database " + database);
                executeUpdate(c, "use " + database);


                executeUpdate(c, "create function dbo.numbers_itf()\n" +
                        "returns table\n" +
                        "as return\n" +
                        "    select * from (values (1), (2), (3), (4), (5), (6), (7), (8), (9), (10)) t(num)\n" +
                        "\n");

                executeUpdate(c, "create function dbo.numbers_tf()\n" +
                        "  returns @result table (\n" +
                        "    num int\n" +
                        "  )\n" +
                        "  as\n" +
                        "  begin\n" +
                        "    insert into @result\n" +
                        "    select * from (values (1), (2), (3), (4), (5), (6), (7), (8), (9), (10)) t(num)\n" +
                        "    return\n" +
                        "  end\n" +
                        "\n");


                executeTest(c, 0, "select count(*) from dbo.numbers_itf()", 10);
                executeTest(c, 5, "select count(*) from dbo.numbers_itf()", 10);
                executeTest(c, 0, "select count(*) from dbo.numbers_tf()", 10);
                executeTest(c, 5, "select count(*) from dbo.numbers_tf()", 10);

            } finally {
                executeUpdate(c,"use master");
                executeUpdate(c, "drop database " + database);
            }
        }
    }

    private static int executeUpdate(Connection c, String sql) throws SQLException {
        try (Statement s = c.createStatement()) {
            return s.executeUpdate(sql);
        }
    }

    private static void executeTest(Connection c, int maxRows, String sql, int expected) throws SQLException {
        try (Statement s = c.createStatement()) {
            s.setMaxRows(maxRows);
            s.execute(sql);
            try (ResultSet r = s.getResultSet()) {
                r.next();
                int actual = r.getInt(1);
                System.out.format("maxRows(%d): %s => %d  %s%n", s.getMaxRows(), sql, actual, (actual != expected ? "FAILED!" : "OK!"));
            }
        }
    }

    private static String randomName(String prefix) {
        return prefix + (new BigInteger(40, ThreadLocalRandom.current())).toString(32);
    }

}

About this issue

Original URL
State: closed
Created 7 years ago
Comments: 33 (24 by maintainers)

Most upvoted comments

Instead of not reading the remaining packets after reaching maxRows, we could try to read and discard them. This might result in performance degradation when setMaxRows() is used, but also make the API JDBC compliant. I am currently looking into this, will try to create a PR as well as a few benchmarks.

ulvii on Apr 12, 2018

@ulvii Good point. It’s the way how it should work (and how other jdbc drivers work). The point about the double parsing seems like an implementation detail. Why not parse it once and use the information to both count rows and create result set?

tseylerd on Apr 9, 2018

Now, I may have been a bit quick (sorry!) to say this is not worth implementing. My worry is that if we cannot break backwards compat the change would have to be off by default which greatly reduces its usefulness. In addition, given this may also have pretty big performance impacts (since the server would not know to create a plan optimized for sending few rows) that reduces the number of times you would actually advice users to turn on the setting.

I’m fine with breaking backwards compatibility for this since, as mentioned up thread, it really makes no sense to start. I referenced maintaining it out of a general principle of doing so where possible so wouldn’t argue against it if that’s the consensus.

The primary use case for both setMaxRows(…) and SET ROWCOUNT is when executing arbitrary user SQL, generally as either a query editor UI interface or some type of report generation. Both are a crutch to deal with users who don’t specify their own TOP/FETCH clauses in the underlying SQL.

In their current state they have no other real world use case and arguably can’t be used for this one either as they’d horrible break anyone who could also be doing arbitrary DML. That’s why we don’t use it.

Honestly it was very surprising/shocking when we first tried it out and realized that’s how setMaxRows(...)/SET ROWCOUNT ~~works~~ breaks.

The proper driver solution would probably be to support automatically adding TOP/FETCH NEXT to the select statement. This requires very complex logic for a general reliable solution since there may be any number of statements in the string/batch and figuring which ones may be the first result set coming back and injecting the TOP into the syntax is far from trivial.

-1 from me on this. The driver should not modify the user’s SQL.

At best you’ll have an unsupportable nightmare that works in a small subset of cases. Dealing with CTEs, RETURNING clauses in DML, and stored procedures that return result sets makes this a non-starter. Plus a SQL command can return multiple result sets and you can’t modify the SQL of the innards of a stored procedure.

Having something like this only work for the most basic SELECT ... FROM ... would make it pointless.

We actually have similar functionality in SQL Server which is the sp_describe_first_result_set spec proc. This proc will return the shape of the first possible result set and it was a pretty huge project to implement (think of code branches etc. that have to be handled, it needs to be very efficient etc.).

Having a server side proc to determine the return type for arbitrary SQL isn’t bad as the parsing logic is already there. Presumably the server do all the logic for determining what it would do to actually execute the query and then return only the metadata. Having the driver do the same, if done correctly, would require all that same parsing logic and may not even be possible (ex: stored proc calls a different stored proc).

The server also has the blunt approach of actually executing the command in a new transaction, seeing what the result would be, and then rolling it back. I’m sure that has its own issues though.

Finally, like @sehrope mentions maxRows/SET ROWCOUNT/TOP really don’t make much sense without ORDER BY in the query. It can be very useful for UPDATE and DELETE in order to split up a change into multiple batches to make transactions smaller etc. (UPDATE TOP(1000) mytable SET IsChanged = 1, … WHERE IsChanged = 0) .

Sure splitting up a large DML into smaller chunks can make sense, but you need a way to track what’s being changed (ex: the isChanged flag in your example) and you’d have to run it in a loop too. Having it apply to a WHERE-less UDPATE foo SET bar = '...' makes no sense.

This begs the question why this is in the JDBC spec (probably for the simple case of preventing too many rows being returned as well as supporting databases without much query language).

The setMaxRows() in the JDBC spec only refers to ResultSets and I’m guessing is there in support of the arbitrary SQL use case I described above. It’s meant to be a way to provide a hint to the server that you’re not going to process the entire result regardless of how it’s written so it should try (if possible) to optimize for that.

I think what’s really needed is a server side change to have SET ROWCOUNT (or it’s replacement) be a hint to the server that you are only going to fetch the first N rows of whatever result sets are brought back. That way the server doesn’t waste CPU/bandwidth on the wire sending the rest of the results (which the driver would need to skip over and discard). Without server side support I don’t see a way of this being workable.

sehrope on Mar 24, 2017