pendulum: Period length calculation problem

I discovered something odd when experimenting with period length calculations around DST change time. Please consider this example:

tz = pendulum.timezone('America/Los_Angeles')
start_time = pendulum.parse('2018-03-11 09:00:00+00:00')
end_time = pendulum.parse('2018-03-11 10:00:00+00:00')
print((end_time - start_time).hours)     # 1
print((end_time.in_tz(tz) - start_time).hours)  # 1
print((end_time - start_time.in_tz(tz)).hours)  # 1
print((end_time.in_tz(tz) - start_time.in_tz(tz)).hours)  # 2 ?!

p1 = end_time - start_time
print(start_time + p1 == end_time)   # True
print(p1.total_hours())  # 1.0

p2 = end_time.in_tz(tz) - start_time.in_tz(tz)
print(start_time + p2 == end_time)  # False
print(p2.total_hours())  # 1.0

About this issue

Original URL
State: closed
Created 6 years ago
Reactions: 1
Comments: 17 (12 by maintainers)

Most upvoted comments

@pganssle

The idea is we should always have:

dt_end = dt + (dt_end - dt)

which Pendulum actually does right when adding/subtracting units of variable length (like days):

>>> import pendulum
>>> dt = pendulum.create(2018, 3, 11, 0, 30, tz='America/Los_Angeles')
>>> dt_end = pendulum.create(2018, 3, 12, 0, 30, tz='America/Los_Angeles')
>>> dt_end == dt + (dt_end - dt)
True

but not for other units:

>>> dt = pendulum.create(2018, 3, 11, 0, 30, tz='America/Los_Angeles')
>>> dt_end = pendulum.create(2018, 3, 12, 3, 30, tz='America/Los_Angeles')
>>> dt_end == dt + (dt_end - dt)
False

This is more or less what this issue is about.

And the thing is the standard library does not get this right:

>>> from datetime import datetime, timedelta
>>> from dateutil.tz import gettz
>>> dt = datetime(2018, 3, 11, 0, 30, tzinfo=gettz('America/Los_Angeles'))
>>> dt_end = datetime(2018, 3, 12, 0, 30, tzinfo=gettz('America/Los_Angeles'))
>>> dt_end == dt + (dt_end - dt)
False
>>> dt_end - dt
datetime.timedelta(0, 82800) 
# This is actually correct (23 hours) but it's also a full day
# however there is no way to handle that in the standard library
>>> dt_end = datetime(2018, 3, 11, 3, 30, tzinfo=gettz('America/Los_Angeles'))
>>> dt_end == dt + (dt_end - dt)
False
>>> dt_end - dt
datetime.timedelta(0, 7200)  # But only one hour has passed

sdispater on Feb 20, 2018

@pganssle

By “drop-in replacement”, I take it to mean that you can pass pendulum objects to functions expecting datetime

And you can, I left the standard API methods and properties alone when subclassing the datetime class to avoid unexpected suprises. If this is not the case, it’s a bug and you can file an issue.

And don’t get me wrong here, I understand your point of view. But you have to admit that’s it’s far from trivial to handle DST transitions with the standard library.

I mean when I get gettz('America/Los_Angeles') it’s because I want to work in this particular timezone and as such I also want it to respect the DST transitions that occur which the standard library does not.

This is especially true in Python 3.6 and the introduction of the fold attribute. This attribute has been introduced to disambiguate repeated times and yet we have this inconsistency:

>>> from datetime import datetime
>>> from dateutil.tz import gettz
>>> la = gettz('America/Los_Angeles')
>>> pre_dt = datetime(2018, 11, 4, 1, 30, tzinfo=la, fold=0)
>>> post_dt = datetime(2018, 11, 4, 1, 30, tzinfo=la, fold=1)
>>> print(pre_dt)
'2018-11-04 01:30:00-07:00'
>>> print(post_dt)
'2018-11-04 01:30:00-08:00'
>>> print(post_dt - pre_dt)
'0:00:00`

As you can see, it should be 1:00:00.

And, to me, the main problem is that Python is somewhat the exception regarding datetime with timezone arithmetic compared to other languages which can be really confusing for newcomers (and even for seasoned developers):

In PHP:

>>> $tz = new DateTimeZone('America/Los_Angeles');
>>> $dt = new DateTime('2018-03-11 01:30', $tz);
>>> echo($dt->format('r') . "\n");
'Sun, 11 Mar 2018 01:30:00 -0800'
>>> $it = DateInterval::createFromDateString('1 hours');
>>> $dt->add($it);
>>> echo($dt->format('r') . "\n");
'Sun, 11 Mar 2018 03:30:00 -0700'

In Ruby:

>>> tz = ActiveSupport::TimeZone['America/Los_Angeles']
>>> dt = tz.parse('2018-03-11T01:30:00')
>>> puts(dt)
'2018-03-11 01:30:00 -0800'
>>> puts(dt + 1.hour)
'2018-03-11 03:30:00 -0700'

In Go:

package main

import (
	"fmt"
	"time"
)

func main() {
	dt := time.Date(2018, 3, 11, 1, 30, 0, 0, la)
	fmt.Println("", dt)
	dt_end := dt.Add(1 * time.Hour)
	fmt.Println("", dt_end)
}

it will output:

2018-03-11 01:30:00 -0800 PST
2018-03-11 03:30:00 -0700 PDT

And I think this is the behavior than most people expect when working in a timezone because that’s the one that makes most sense.

This why I started Pendulum because neither the standard library nor the other datetime libraries out there handle it this way.

sdispater on Feb 21, 2018

@dekoza Yes, you’re right, precise_diff() was the culprit 😃

If both datetimes were in the same timezone, it would not use the offsets to get the proper time difference. But this is only true for variable-length units (year, month, day).

Commit 5b06241 should fix the issue.

It will land in the next bugfix release.

sdispater on Feb 20, 2018

@sdispater It’s not about wrong or right here, it’s a choice between two equally valid conceptions of the semantics. If you say “add 3 hours to midnight”, you get 3AM because you’ve advanced the clock 3 hours. If you say “start at midnight and jump to the time after which 3 real-time hours have elapsed”, you will end up at 4AM, not 3AM in this case.

The reason I bring up the “1 day” case is that there’s no “right” intuition that covers all the cases, because generally on the scale of 1 day you tend to be doing operations where you want wall clock offsets, whereas offsets of around 1-2 hours, you tend to be thinking about elapsed time so your intuition tends to prefer absolute elapsed time.

One major problem in designing an API around this is that I can’t think of a way that would be “obvious” here other than simply disallowing arithmetical operations entirely since they are overloaded and encoding the semantics you want into the operator itself, but that would be cumbersome and that ship has sailed long ago.

If you want pendulum to continue to be a (drop in) datetime replacement, I think the right thing to do is to maintain the standard library’s conventions for the semantics and (as it seems you’ve done with the in_days and in_hours functions), define extensions that provide a more obvious and intuitive way to be explicit about the semantics you care about.

pganssle on Feb 21, 2018

@pganssle

My mistake I thought gettz() returned the same tzinfo object.

But still, your examples seem counter-intuitive to me:

>>> from datetime import datetime, timedelta
>>> from dateutil.tz import gettz
>>> LA = gettz('America/Los_Angeles')
>>> dt = datetime(2018, 3, 11, 0, 30, tzinfo=LA)
>>> dt_end = datetime(2018, 3, 12, 0, 30, tzinfo=LA)
>>> print(dt_end - dt)
'1 day, 0:00:00'

This one is actually right but you lose the information that it’s also 23 hours. I know you can get the 23 hours by going trough UTC but that’s far from being intuitive.

>>> dt_end = datetime(2018, 3, 11, 3, 30, tzinfo=LA)
>>> print(dt_end -dt)
'3:00:00'

This is actually wrong, only 2 hours have passed in the America/Los_Angeles timezone not 3.

And this is where I wanted Pendulum to be right:

>>> import pendulum
>>> dt = pendulum.create(2018, 3, 11, 0, 30, tz='America/Los_Angeles')
>>> dt_end = pendulum.create(2018, 3, 12, 0, 30, tz='America/Los_Angeles')
>>> (dt_end - dt).in_days()
1
>>> (dt_end - dt).in_hours()
23
>>> dt_end = pendulum.create(2018, 3, 11, 3, 30, tz='America/Los_Angeles')
>>> (dt_end - dt).in_hours()
2

sdispater on Feb 21, 2018

I don’t think there’s anything wrong here. If you use total_hours() you’ll get the behavior you expect.

The problem is that in the America/Los_Angeles time zone, there are 2 wall hours between those two UTC times. The first example you’re subtracting two times both in UTC, so there’s both 1 wall hour and 1 actual hour.

In the next two examples, you have “mixed zone” subtractions, so “wall time” doesn’t make a lot of sense, so it’s very intuitive to me that if you allow these operations to exist, you’d return the total elapsed amount of time between them.

In the final example, both are in the same “wall time” and you can actually see that one of them is 3:00 and the other one is 01:00, so there are 2 “wall hours” between them (even though only one actually existed).

This is somewhat counter-intuitive on short timescales, but I find these dynamics much less counter-intuitive on the time scale of days and hours. For example noon the day before a transition is “1 day” before noon the day after a transition, even though the actual elapsed time is either 23 or 25 hours.

I think the “same zone” and “between zone” dynamics are similar to what I wrote about in this blog post (though in that case I think the result is somewhat absurd, whereas in this case I understand why it makes sense).

pganssle on Feb 20, 2018

Thanks for reporting this!

Yes, there is definitely something wrong here.

I will take a look and get back to you.

sdispater on Feb 20, 2018