mssql-jdbc: [BUG] Performance Issues reading LocalDate,LocalDateTime, and LocalTime
Driver version
7.2.1.jre8
SQL Server version
Microsoft SQL Server 2014 (SP3) (KB4022619) - 12.0.6024.0 (X64) Sep 7 2018 01:37:51 Copyright © Microsoft Corporation Developer Edition (64-bit) on Windows NT 6.3 <X64> (Build 9600: )
Client Operating System
Ubuntu 16.04
JAVA/JVM version
openjdk version “12” 2019-03-19 OpenJDK Runtime Environment (build 12+33) OpenJDK 64-Bit Server VM (build 12+33, mixed mode, sharing)
Table schema
CREATE TABLE test.DatePerformance (
someDate DATE
)
-- Populate Table with Random Dates
INSERT INTO test.DatePerformance
SELECT DATEADD(DAY, -RAND(12313456) * 1000, GETDATE())
WHILE((SELECT COUNT(*) FROM test.DatePerformance) < 2500)
INSERT INTO test.DatePerformance
SELECT DATEADD(DAY, -RAND() * 1000, GETDATE())
Problem description
-
Expected behaviour:
- Reading Date objects from the database should have low overhead.
-
Actual behaviour:
- There is substantial overhead creating GregorianCalendars and converting them to the LocalDate, LocalDateTime, LocalTime classes
- Calling Calendar.getInstance(TimeZone.getTimeZone(“UTC”)) is 30% of the getObject call for LocalDate, LocalDateTime, LocalTime classes
-
Error message/stack trace:
-
Any other details that can be helpful:
- Result Set - Calendar.getInstance https://github.com/microsoft/mssql-jdbc/blob/v7.2.1/src/main/java/com/microsoft/sqlserver/jdbc/SQLServerResultSet.java#L2384-L2398
- DDC.convertTemporalToObject - https://github.com/microsoft/mssql-jdbc/blob/v7.2.1/src/main/java/com/microsoft/sqlserver/jdbc/DDC.java#L755-L1061
Sample Profiling Snapshots
com.microsoft.sqlserver.jdbc.SQLServerResultSet.getObject () 15.094s
> com.microsoft.sqlserver.jdbc.DDC.convertTemporalToObject () 9.206s
>> java.util.GregorianCalendar.<init> () 6.01s
>> java.util.Calendar.getTimeInMillis () 3.005s
> java.util.Calendar.getInstance () 5.088s
com.microsoft.sqlserver.jdbc.SQLServerResultSet.getObject () 7.601s
> com.microsoft.sqlserver.jdbc.DDC.convertTemporalToObject () 3.906s
>> java.util.GregorianCalendar.<init> () 2.498s
>> java.util.Calendar.getTimeInMillis () 1.407s
> java.util.Calendar.getInstance () 3.302s
Reproduction code
/**
Using JMH (LocalDate)
Result: 338.352 ±(99.9%) 27.177 ops/s [Average]
Statistics: (min, avg, max) = (5.716, 338.352, 435.926), stdev = 115.069
Confidence interval (99.9%): [311.175, 365.529]
# Run complete. Total time: 00:08:10
Benchmark Mode Samples Score Score error Units
c.c.m.MyBenchmark.testMethod thrpt 200 338.352 27.177 ops/s
*/
@Benchmark
public void testMethod()
{
List<LocalDate> dates = new ArrayList<>();
try (
Connection conn = dataSource.getConnection();
PreparedStatement ps = conn.prepareStatement("SELECT someDate from test.DatePerformance");
ResultSet rs = ps.executeQuery()
)
{
while(rs.next())
dates.add(rs.getObject("someDate", LocalDate.class));
} catch (SQLException e)
{
e.printStackTrace();
}
}
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 3
- Comments: 31 (31 by maintainers)
Hi @cogman, thanks for the input. I agree that we should use as little of GregorianCalendar as possible. I’d like to replace all instances of Calendar in the driver, but the JDBC specs state that the driver needs to provide public APIs that accept Calendar objects (mostly from getters/setters for temporal objects in prepared/callable statements).
Looking at how the driver actually uses the Calendar objects passed from the client, most of the time they’re used for the timezone information. I’ll look into extracting the necessary information from the Calendar object as soon as it’s passed from the public API, while replacing as many instances of Calendar parameters with TimeZone parameters in internal methods. I’ll also make sure that the driver never initializes any GregorianCalendar objects, either.
It’ll take a while, but I’ll update everyone once the initial PR/design has been made. Please let me know if you have any other input in the meantime.
@gordthompson Care should be taken in the fact that the Calendar object is not thread safe.
Ideally, the driver would switch away from using the calendar object and instead look at using ZonedDateTime or straight local dates. If my reading of the TDS standard is correct, than a SQL “Date” ends up on the wire as the integer “number of days since 1-1-1” That would be a trivial conversion to localDates because it would simply be “LocalDate.of(1,1,1).addDays(tdsIntValue);” (or, whatever that API is.
The new system is much faster and thread safe to boot. Converting to java.sql.Date is also pretty simple because it is just a “Date.from(ZonedDateTime.toInstant())” away.
Hi everyone, Thanks for all the effort. We will investigate the impact of the change for different JDBC/SQL types. In the meantime, please feel free to provide any other details that you think would speed up the process. 👍
Hi all,
I wrote a quick and dirty patch for benchmarking. https://github.com/microsoft/mssql-jdbc/compare/dev...harawata:gh-1070?expand=1
I also created a JMH project, but am not sure if I did it right (I haven’t gotten any consistent result probably because there are many other processes running on the same machine).
@ulvii
Here are some answers to your question
Yes. The Gregorian calendar has a lot of design flaws. One of the largest being that it is not thread safe (completely mutable) and is fairly slow because it does a ton of excess work on initialization. That could be a large change, so I’m ok with it waiting.
Yes they provide all the same functionality while also being much closer aligned with how you want to think about dates. The old API tried to hide away when it was doing things like time zone conversions for you. The new API is much better at keeping that in plain site. Further, the old API was highly mutable in often surprising ways. The new API is completely immutable. Finally, converting between old and new is relatively simple.
Here are some JMH results based on @gordthompson 's benchmark using Java 11.
Validates his claim of being around 5->6x faster.
I don’t think you should be looking to simply replace
CalendarwithZonedDateTimein the existing code. I think you should be looking to eliminate the whole notion of time zones wherever possible. That would mean unpackingdatetime2columns directly toLocalDateTime, and unpackingdatetimeoffsetcolumns directly toOffsetDateTime.Calendarmay still have to be used for things likegetTimestamp("colname", Calendar), but in those cases the user would be supplying their ownCalendaranyway.The following POC code illustrates what I mean. It shows that with
java.timewe can directly convert TDS bytes toLocalDateTimefor the differentdatetime2(precision)values, at least for a reasonably current date:Console output:
I would argue that it is not necessary to ensure that values unpacked to a
Calendarobject and values unpacked to aZonedDateTimeobject should yield the same results because they are different things. As long as the behaviour for legacy date/time objects (Calendar,java.sql.Timestamp) objects does not change — for backward compatibility — theZonedDateTimevalue can be whateverjava.timethinks it should be.In other words, if
datetime2values are always initially unpacked toLocalDateTimeand then, if desired, converted tojava.sql.Timestamp(via a user-suppliedCalendar) or tojava.time.Instant(via a user-specified time zone) then mssql-jdbc would be “playing by the rules” and any discrepancies betweenCalendarandZonedDateTimevalues would be something for the Java team to explain.@ulvii I think us customers at least are all in agreement. Switching over would generally be a good improvement for the driver both from a performance standpoint and from a ergonomic standpoint in the codebase.
If I get some time today I might dive in a bit and see if I can bang something out (or at least how big of an impact it would be). Do you have any more questions for us?
@cogman - Thanks again. The build was producing two separate jar files and I was simply trying to run the wrong one. (D’oh!)
Running your latest tests on my very-old-and-slow test box yielded
If I’m reading that right, UTC_Timestamp via LocalDateTime is actually more like twice as fast as UTC_Timestamp via GregorianCalendar on my test box. Very different from my previous naïve attempt at benchmarking.
Still, it’s not really a fair fight. The GregorianCalendar approach has to deal with the baggage of a Calendar object while the LocalDateTime approach just uses a fixed offset (UTC+0) and avoids the whole “messy rest-of-the-world time zone” thing. Since the code needs to be able to produce a Timestamp in any time zone (not just UTC) a “better” comparison would be to create the Timestamp in the JVM default time zone (which for me happens to be America/Denver) …
… and the results I see are …
Again, if I’m reading things correctly, the LocalDateTime approach took a hit (from 211k ops/s to 144k ops/s) and so did the GregorianCalendar approach (from 95k ops/s to 74k ops/s) but the LocalDateTime approach still seems to be roughly twice as fast.
@gordthompson I believe your benchmark might be falling prey to some common java microbenchmarking problems. (No warmup, not consuming the result, etc). If I were to guess, your method isn’t getting fully optimized since it isn’t “hot” enough.
I’ve ran your scenario through JMH tests and I’m getting different results (but not really surprising).
It isn’t surprising because the 2 extra steps to create a Timestamp boil down to a couple of additions and multiplications. The calendar code has a ton of extra checks. It is pretty intense when you start reading everything that goes into “getTimeInMillis” on the cal object.
Here is the code I benchmarked with
Also, here is an excellent video by Aleksey Shipilev (Author of JMH and a major player in Java performance) about microbenchmark pitfalls. https://www.youtube.com/watch?v=VaWgOCDBxYw
re: performance
Here’s a simple benchmark to compare the results of unpacking a datetime2 value.
I ran the test several times and the comments showing the console output are representative values from my (very old) test machine.
Hi @lavabear, @gordthompson , @cogman , @harawata , Thank you for the suggestions and sorry for the delay, the team is currently busy with the release preparations.
I went through the discussion briefly and wanted to clarify a few things.
GregorianCalendar, both when reading and writing. Are you proposing to completely migrate away fromGregorianCalendartoZonedDateTime/LocalDateTimeor only forLocalDate,LocalDateTime, andLocalTimeclasses when callinggetObject()?ZonedDateTime/LocalDateTimeAPIs provide all the functionality thatGregorianCalendarhas? Are you aware of any limitations?To speed up the process, please feel free to create an initial PR or post any other recommendations that we should keep in mind when investigating the proposal.