mssql-jdbc: [BUG] Performance Issues reading LocalDate,LocalDateTime, and LocalTime

Driver version

7.2.1.jre8

SQL Server version

Microsoft SQL Server 2014 (SP3) (KB4022619) - 12.0.6024.0 (X64) Sep 7 2018 01:37:51 Copyright © Microsoft Corporation Developer Edition (64-bit) on Windows NT 6.3 <X64> (Build 9600: )

Client Operating System

Ubuntu 16.04

JAVA/JVM version

openjdk version “12” 2019-03-19 OpenJDK Runtime Environment (build 12+33) OpenJDK 64-Bit Server VM (build 12+33, mixed mode, sharing)

Table schema

CREATE TABLE test.DatePerformance (
	someDate DATE
)

-- Populate Table with Random Dates
INSERT INTO test.DatePerformance
SELECT DATEADD(DAY, -RAND(12313456) * 1000, GETDATE())

WHILE((SELECT COUNT(*) FROM test.DatePerformance) < 2500)
INSERT INTO test.DatePerformance
SELECT DATEADD(DAY, -RAND() * 1000, GETDATE())

Problem description

  1. Expected behaviour:

    • Reading Date objects from the database should have low overhead.
  2. Actual behaviour:

    • There is substantial overhead creating GregorianCalendars and converting them to the LocalDate, LocalDateTime, LocalTime classes
    • Calling Calendar.getInstance(TimeZone.getTimeZone(“UTC”)) is 30% of the getObject call for LocalDate, LocalDateTime, LocalTime classes
  3. Error message/stack trace:

  4. Any other details that can be helpful:

Sample Profiling Snapshots

	com.microsoft.sqlserver.jdbc.SQLServerResultSet.getObject () 15.094s
>	com.microsoft.sqlserver.jdbc.DDC.convertTemporalToObject () 9.206s
>>	java.util.GregorianCalendar.<init> () 6.01s
>>	java.util.Calendar.getTimeInMillis () 3.005s
>	java.util.Calendar.getInstance () 5.088s

	com.microsoft.sqlserver.jdbc.SQLServerResultSet.getObject () 7.601s
>	com.microsoft.sqlserver.jdbc.DDC.convertTemporalToObject () 3.906s
>>	java.util.GregorianCalendar.<init> () 2.498s
>>	java.util.Calendar.getTimeInMillis () 1.407s
>	java.util.Calendar.getInstance () 3.302s

Reproduction code

    /**
    Using JMH (LocalDate)

    Result: 338.352 ±(99.9%) 27.177 ops/s [Average]
    Statistics: (min, avg, max) = (5.716, 338.352, 435.926), stdev = 115.069
    Confidence interval (99.9%): [311.175, 365.529]


    # Run complete. Total time: 00:08:10

    Benchmark                        Mode  Samples    Score  Score error  Units
    c.c.m.MyBenchmark.testMethod    thrpt      200  338.352       27.177  ops/s
    */

    @Benchmark
    public void testMethod()
    {
        List<LocalDate> dates = new ArrayList<>();

        try (
                Connection conn = dataSource.getConnection();
                PreparedStatement ps = conn.prepareStatement("SELECT someDate from test.DatePerformance");
                ResultSet rs = ps.executeQuery()
        )
        {
            while(rs.next())
                dates.add(rs.getObject("someDate", LocalDate.class));

        } catch (SQLException e)
        {
            e.printStackTrace();
        }
    }

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 3
  • Comments: 31 (31 by maintainers)

Most upvoted comments

Hi @cogman, thanks for the input. I agree that we should use as little of GregorianCalendar as possible. I’d like to replace all instances of Calendar in the driver, but the JDBC specs state that the driver needs to provide public APIs that accept Calendar objects (mostly from getters/setters for temporal objects in prepared/callable statements).

Looking at how the driver actually uses the Calendar objects passed from the client, most of the time they’re used for the timezone information. I’ll look into extracting the necessary information from the Calendar object as soon as it’s passed from the public API, while replacing as many instances of Calendar parameters with TimeZone parameters in internal methods. I’ll also make sure that the driver never initializes any GregorianCalendar objects, either.

It’ll take a while, but I’ll update everyone once the initial PR/design has been made. Please let me know if you have any other input in the meantime.

@gordthompson Care should be taken in the fact that the Calendar object is not thread safe.

Ideally, the driver would switch away from using the calendar object and instead look at using ZonedDateTime or straight local dates. If my reading of the TDS standard is correct, than a SQL “Date” ends up on the wire as the integer “number of days since 1-1-1” That would be a trivial conversion to localDates because it would simply be “LocalDate.of(1,1,1).addDays(tdsIntValue);” (or, whatever that API is.

The new system is much faster and thread safe to boot. Converting to java.sql.Date is also pretty simple because it is just a “Date.from(ZonedDateTime.toInstant())” away.

Hi everyone, Thanks for all the effort. We will investigate the impact of the change for different JDBC/SQL types. In the meantime, please feel free to provide any other details that you think would speed up the process. 👍

Hi all,

I wrote a quick and dirty patch for benchmarking. https://github.com/microsoft/mssql-jdbc/compare/dev...harawata:gh-1070?expand=1

I also created a JMH project, but am not sure if I did it right (I haven’t gotten any consistent result probably because there are many other processes running on the same machine).

@ulvii

Here are some answers to your question

  1. Yes. The Gregorian calendar has a lot of design flaws. One of the largest being that it is not thread safe (completely mutable) and is fairly slow because it does a ton of excess work on initialization. That could be a large change, so I’m ok with it waiting.

  2. Yes they provide all the same functionality while also being much closer aligned with how you want to think about dates. The old API tried to hide away when it was doing things like time zone conversions for you. The new API is much better at keeping that in plain site. Further, the old API was highly mutable in often surprising ways. The new API is completely immutable. Finally, converting between old and new is relatively simple.

  3. Here are some JMH results based on @gordthompson 's benchmark using Java 11.

MyBenchmark.test_read_local_date_new thrpt 25 19109523.054 ± 136645.875 ops/s MyBenchmark.test_read_local_date_new_calendar_current_implementation thrpt 25 3235636.892 ± 30089.733 ops/s MyBenchmark.test_read_local_date_reuse_calendar thrpt 25 3254206.917 ± 51282.245 ops/s

Validates his claim of being around 5->6x faster.

package org.sample;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.infra.Blackhole;

import java.time.LocalDateTime;
import java.time.ZoneId;
import java.util.*;

@State(Scope.Thread)
public class MyBenchmark {
	private static long dateValue = 737240L;
	private static long timeValue = 432000000000L;
	private static LocalDateTime EPOCH = LocalDateTime.of(1, 1, 1, 12, 0);

	private GregorianCalendar cal = new GregorianCalendar(new SimpleTimeZone(0, "UTC"), Locale.US);
	private java.util.Date foreverAgo = new java.util.Date(Long.MIN_VALUE);
	private ZoneId utcZone = ZoneId.of("UTC");

	@Benchmark
	public void test_read_local_date_new_calendar_current_implementation(Blackhole bh) {
		bh.consume(makeReusingCalendar());
	}

	private LocalDateTime makeNewCalendar() {
		GregorianCalendar cal = new GregorianCalendar(new SimpleTimeZone(0, "UTC"), Locale.US);
		java.util.Date foreverAgo = new java.util.Date(Long.MIN_VALUE);
		ZoneId utcZone = ZoneId.of("UTC");

		cal.clear();
		cal.setGregorianChange(foreverAgo);  // proleptic
		cal.set(1, 0, 1, 0, 0, 0);  // 0001-01-01
		cal.add(GregorianCalendar.DAY_OF_YEAR, (int)dateValue);
		cal.add(GregorianCalendar.MILLISECOND, (int) (timeValue / 10000L));
		return LocalDateTime.ofInstant(cal.toInstant(), utcZone);
	}


	@Benchmark
	public void test_read_local_date_reuse_calendar(Blackhole bh) {
		bh.consume(makeReusingCalendar());
	}

	private LocalDateTime makeReusingCalendar() {

		cal.clear();
		cal.setGregorianChange(foreverAgo);  // proleptic
		cal.set(1, 0, 1, 0, 0, 0);  // 0001-01-01
		cal.add(GregorianCalendar.DAY_OF_YEAR, (int)dateValue);
		cal.add(GregorianCalendar.MILLISECOND, (int) (timeValue / 10000L));
		return LocalDateTime.ofInstant(cal.toInstant(), utcZone);
	}

	@Benchmark
	public void test_read_local_date_new(Blackhole bh) {
		bh.consume(EPOCH.plusDays(dateValue).plusNanos(timeValue * 100));
	}
}

I don’t think you should be looking to simply replace Calendar with ZonedDateTime in the existing code. I think you should be looking to eliminate the whole notion of time zones wherever possible. That would mean unpacking datetime2 columns directly to LocalDateTime, and unpacking datetimeoffset columns directly to OffsetDateTime. Calendar may still have to be used for things like getTimestamp("colname", Calendar), but in those cases the user would be supplying their own Calendar anyway.

The following POC code illustrates what I mean. It shows that with java.time we can directly convert TDS bytes to LocalDateTime for the different datetime2(precision) values, at least for a reasonably current date:

package com.example.sandbox;

import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.time.LocalDateTime;

public class SandboxMain {

    public static void main(String[] args) throws Throwable {
        for (int precision = 0; precision <= 7; ++precision) {
            decodeDatetime2ForPrecision(precision);
        }
    }
    
    private static void decodeDatetime2ForPrecision(int precision) {
        byte[] tdsByteArray = getDatetime2BytesFromTdsPayload(precision);
        
        // decode datetime2 bytes as per TDS specification
        //
        // https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-tds/786f5b8a-f87d-4980-9070-b9b7274c681d

        Integer multiplier = null;  // for conversion to nanoseconds 
        switch (precision) {
            case 0: multiplier = 1_000_000_000; break;
            case 1: multiplier = 100_000_000; break;
            case 2: multiplier = 10_000_000; break;
            case 3: multiplier = 1_000_000; break;
            case 4: multiplier = 100_000; break;
            case 5: multiplier = 10_000; break;
            case 6: multiplier = 1_000; break;
            case 7: multiplier = 100; break;
        }
        
        System.out.printf("Precision = %d%n", precision);
                
        int timeLength = tdsByteArray.length - 3;  // DATE part is always 3 bytes
        ByteBuffer bb = ByteBuffer.allocate(12);
        bb.put(tdsByteArray, 0, timeLength);  // TIME part
        bb.position(8);
        bb.put(tdsByteArray, timeLength, 3);  // DATE part
        bb.position(0);
        bb.order(ByteOrder.nativeOrder());
        
        long fractionalSeconds = bb.getLong();
        System.out.printf("  TIME part: %d%n", fractionalSeconds);
        
        int days = bb.getInt();
        System.out.printf("  DATE part: %d%n", days);
        
        LocalDateTime ldt = LocalDateTime.of(1, 1, 1, 0, 0)
                .plusDays(days)
                .plusNanos(fractionalSeconds * multiplier);
        System.out.printf("  LocalDateTime: %s%n", ldt.toString());
    }
    
    private static byte[] getDatetime2BytesFromTdsPayload(int precision) {
        // --- Test Data ---
        //
        // test case: SELECT CAST('2017-01-01 01:23:45.1' AS DATETIME2(precision))
        //
        byte[] tdsBytes = null;  // actual TDS bytes from server (via Wireshark)
        switch (precision) {
            case 0:
                tdsBytes = createByteArray("a1 13 00 49 3c 0b");
                break;
            case 1:
                tdsBytes = createByteArray("4b c4 00 49 3c 0b");
                break;
            case 2:
                tdsBytes = createByteArray("ee aa 07 49 3c 0b");
                break;
            case 3:
                tdsBytes = createByteArray("4c ad 4c 00 49 3c 0b");
                break;
            case 4:
                tdsBytes = createByteArray("f8 c4 fe 02 49 3c 0b");
                break;
            case 5:
                tdsBytes = createByteArray("b0 b1 f3 1d 00 49 3c 0b");
                break;
            case 6:
                tdsBytes = createByteArray("e0 f0 84 2b 01 49 3c 0b");
                break;
            case 7:
                tdsBytes = createByteArray("c0 68 31 b3 0b 49 3c 0b");
                break;
        }
        return tdsBytes;
    }

    private static byte[] createByteArray(String s) {
        String[] stringArray = s.split(" ");
        byte[] byteArray = new byte[stringArray.length];
        for (int i = 0; i < stringArray.length; ++i) {
            byteArray[i] = (byte) Integer.parseInt(stringArray[i], 16);
        }
        return byteArray;
    }   

}

Console output:

Precision = 0
  TIME part: 5025
  DATE part: 736329
  LocalDateTime: 2017-01-01T01:23:45
Precision = 1
  TIME part: 50251
  DATE part: 736329
  LocalDateTime: 2017-01-01T01:23:45.100
Precision = 2
  TIME part: 502510
  DATE part: 736329
  LocalDateTime: 2017-01-01T01:23:45.100
Precision = 3
  TIME part: 5025100
  DATE part: 736329
  LocalDateTime: 2017-01-01T01:23:45.100
Precision = 4
  TIME part: 50251000
  DATE part: 736329
  LocalDateTime: 2017-01-01T01:23:45.100
Precision = 5
  TIME part: 502510000
  DATE part: 736329
  LocalDateTime: 2017-01-01T01:23:45.100
Precision = 6
  TIME part: 5025100000
  DATE part: 736329
  LocalDateTime: 2017-01-01T01:23:45.100
Precision = 7
  TIME part: 50251000000
  DATE part: 736329
  LocalDateTime: 2017-01-01T01:23:45.100

From my testing, it seems like the Calendar object’s timezone offset always follows the International Standard Time (IST) rules, whereas ZonedDateTime (and Instant) doesn’t necessarily follow IST.

I think it’s possible to solve the offset value problem by finding the expected offset value from Calendar, compare that to the offset value given by ZonedDateTime, then padding the difference to ZonedDateTime object

I would argue that it is not necessary to ensure that values unpacked to a Calendar object and values unpacked to a ZonedDateTime object should yield the same results because they are different things. As long as the behaviour for legacy date/time objects (Calendar, java.sql.Timestamp) objects does not change — for backward compatibility — the ZonedDateTime value can be whatever java.time thinks it should be.

In other words, if datetime2 values are always initially unpacked to LocalDateTime and then, if desired, converted to java.sql.Timestamp (via a user-supplied Calendar) or to java.time.Instant (via a user-specified time zone) then mssql-jdbc would be “playing by the rules” and any discrepancies between Calendar and ZonedDateTime values would be something for the Java team to explain.

@ulvii I think us customers at least are all in agreement. Switching over would generally be a good improvement for the driver both from a performance standpoint and from a ergonomic standpoint in the codebase.

If I get some time today I might dive in a bit and see if I can bang something out (or at least how big of an impact it would be). Do you have any more questions for us?

@cogman - Thanks again. The build was producing two separate jar files and I was simply trying to run the wrong one. (D’oh!)

Running your latest tests on my very-old-and-slow test box yielded

Benchmark                                         Mode  Cnt       Score       Error  Units
MyBenchmark.test_read_local_date_new             thrpt   25  210629.896 ±   726.327  ops/s
MyBenchmark.test_read_local_date_reuse_calendar  thrpt   25   95127.109 ± 12725.871  ops/s

If I’m reading that right, UTC_Timestamp via LocalDateTime is actually more like twice as fast as UTC_Timestamp via GregorianCalendar on my test box. Very different from my previous naïve attempt at benchmarking.

Still, it’s not really a fair fight. The GregorianCalendar approach has to deal with the baggage of a Calendar object while the LocalDateTime approach just uses a fixed offset (UTC+0) and avoids the whole “messy rest-of-the-world time zone” thing. Since the code needs to be able to produce a Timestamp in any time zone (not just UTC) a “better” comparison would be to create the Timestamp in the JVM default time zone (which for me happens to be America/Denver) …

package org.sample;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.infra.Blackhole;

import java.sql.Timestamp;
import java.time.LocalDateTime;
import java.util.*;

@State(Scope.Thread)
public class MyBenchmark {
	private static long dateValue = 737240L;
	private static long timeValue = 432000000000L;
	private static LocalDateTime EPOCH = LocalDateTime.of(1, 1, 1, 12, 0);

	private GregorianCalendar cal = new GregorianCalendar();  // default JVM time zone
	private java.util.Date foreverAgo = new java.util.Date(Long.MIN_VALUE);

	@Benchmark
	public void test_read_local_date_reuse_calendar(Blackhole bh) {
		bh.consume(makeReusingCalendar());
	}

	private Timestamp makeReusingCalendar() {

		cal.clear();
		cal.setGregorianChange(foreverAgo);  // proleptic
		cal.set(1, 0, 1, 0, 0, 0);  // 0001-01-01
		cal.add(GregorianCalendar.DAY_OF_YEAR, (int)dateValue);
		cal.add(GregorianCalendar.MILLISECOND, (int) (timeValue / 10000L));
		return new Timestamp(cal.getTimeInMillis());
	}

	@Benchmark
	public void test_read_local_date_new(Blackhole bh) {
		bh.consume(Timestamp.valueOf(EPOCH.plusDays(dateValue)
				.plusNanos(timeValue * 100)));
	}
}

… and the results I see are …

Benchmark                                         Mode  Cnt       Score      Error  Units
MyBenchmark.test_read_local_date_new             thrpt   25  143919.857 ± 1135.624  ops/s
MyBenchmark.test_read_local_date_reuse_calendar  thrpt   25   74379.197 ±  375.586  ops/s

Again, if I’m reading things correctly, the LocalDateTime approach took a hit (from 211k ops/s to 144k ops/s) and so did the GregorianCalendar approach (from 95k ops/s to 74k ops/s) but the LocalDateTime approach still seems to be roughly twice as fast.

@gordthompson I believe your benchmark might be falling prey to some common java microbenchmarking problems. (No warmup, not consuming the result, etc). If I were to guess, your method isn’t getting fully optimized since it isn’t “hot” enough.

I’ve ran your scenario through JMH tests and I’m getting different results (but not really surprising).

MyBenchmark.test_read_local_date_new thrpt 25 14836448.410 ± 181472.322 ops/s MyBenchmark.test_read_local_date_reuse_calendar thrpt 25 3322991.970 ± 97922.281 ops/s

It isn’t surprising because the 2 extra steps to create a Timestamp boil down to a couple of additions and multiplications. The calendar code has a ton of extra checks. It is pretty intense when you start reading everything that goes into “getTimeInMillis” on the cal object.

Here is the code I benchmarked with

package org.sample;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.infra.Blackhole;

import java.sql.Timestamp;
import java.time.LocalDateTime;
import java.time.ZoneId;
import java.time.ZoneOffset;
import java.util.*;

@State(Scope.Thread)
public class MyBenchmark {
	private static long dateValue = 737240L;
	private static long timeValue = 432000000000L;
	private static LocalDateTime EPOCH = LocalDateTime.of(1, 1, 1, 12, 0);

	private GregorianCalendar cal = new GregorianCalendar(new SimpleTimeZone(0, "UTC"), Locale.US);
	private java.util.Date foreverAgo = new java.util.Date(Long.MIN_VALUE);
	private ZoneId utcZone = ZoneId.of("UTC");

	@Benchmark
	public void test_read_local_date_reuse_calendar(Blackhole bh) {
		bh.consume(makeReusingCalendar());
	}

	private Timestamp makeReusingCalendar() {

		cal.clear();
		cal.setGregorianChange(foreverAgo);  // proleptic
		cal.set(1, 0, 1, 0, 0, 0);  // 0001-01-01
		cal.add(GregorianCalendar.DAY_OF_YEAR, (int)dateValue);
		cal.add(GregorianCalendar.MILLISECOND, (int) (timeValue / 10000L));
		return new Timestamp(cal.getTimeInMillis());
	}

	@Benchmark
	public void test_read_local_date_new(Blackhole bh) {
		bh.consume(Timestamp.from(EPOCH.plusDays(dateValue)
				.plusNanos(timeValue * 100)
				.toInstant(ZoneOffset.UTC)));
	}
}

Also, here is an excellent video by Aleksey Shipilev (Author of JMH and a major player in Java performance) about microbenchmark pitfalls. https://www.youtube.com/watch?v=VaWgOCDBxYw

re: performance

Here’s a simple benchmark to compare the results of unpacking a datetime2 value.

// Test Data - TDS protocol delivers datetime2 values as two integers:
//   1) the number of days since 0001-01-01, and
//   2) the number of 100-nanosecond "ticks" since the beginning of the current day
Long dateValue = 737240L;        // 2019-07-01
Long timeValue = 432000000000L;  // 12:00:00
//
int loopLimit = 1000;
Long t0 = null;  // loop timer start time

// the old way
t0 = System.nanoTime();
GregorianCalendar cal = new GregorianCalendar(new SimpleTimeZone(0, "UTC"), Locale.US);
java.util.Date foreverAgo = new java.util.Date(Long.MIN_VALUE);
ZoneId utcZone = ZoneId.of("UTC");
for (int i = 0; i < loopLimit; i++) {
    cal.clear();
    cal.setGregorianChange(foreverAgo);  // proleptic
    cal.set(1, 0, 1, 0, 0, 0);  // 0001-01-01
    cal.add(GregorianCalendar.DAY_OF_YEAR, dateValue.intValue());
    cal.add(GregorianCalendar.MILLISECOND, (int) (timeValue / 10000L));
    LocalDateTime theOldWay = LocalDateTime.ofInstant(cal.toInstant(), utcZone);
}
System.out.printf("(old) java.util: %d ms for %d iterations%n", 
        (System.nanoTime() - t0) / 1000000, 
        loopLimit);
// (old) java.util: 279 ms for 1000 iterations

// the new way
t0 = System.nanoTime();
for (int i = 0; i < loopLimit; i++) {
    LocalDateTime theNewWay = java.time.temporal.ChronoUnit.DAYS.addTo(
            java.time.temporal.ChronoUnit.NANOS.addTo(
                    dt2Base, 
                    timeValue * 100L), 
            dateValue);
}
System.out.printf("(new) java.time: %d ms for %d iterations%n", 
        (System.nanoTime() - t0) / 1000000, 
        loopLimit);
// (new) java.time: 23 ms for 1000 iterations

I ran the test several times and the comments showing the console output are representative values from my (very old) test machine.

Hi @lavabear, @gordthompson , @cogman , @harawata , Thank you for the suggestions and sorry for the delay, the team is currently busy with the release preparations.

I went through the discussion briefly and wanted to clarify a few things.

  1. Please note that most of the conversions between SQL Server temporal types and JAVA object types are currently handled using GregorianCalendar, both when reading and writing. Are you proposing to completely migrate away from GregorianCalendar to ZonedDateTime/ LocalDateTime or only for LocalDate, LocalDateTime, and LocalTime classes when calling getObject()?
  2. Do the new ZonedDateTime/LocalDateTime APIs provide all the functionality that GregorianCalendar has? Are you aware of any limitations?
  3. Are there benchmarks? Do the new APIs always perform better or there are scenarios where we might have to introduce performance regressions?

To speed up the process, please feel free to create an initial PR or post any other recommendations that we should keep in mind when investigating the proposal.