pendulum: Period length calculation problem
I discovered something odd when experimenting with period length calculations around DST change time. Please consider this example:
tz = pendulum.timezone('America/Los_Angeles')
start_time = pendulum.parse('2018-03-11 09:00:00+00:00')
end_time = pendulum.parse('2018-03-11 10:00:00+00:00')
print((end_time - start_time).hours) # 1
print((end_time.in_tz(tz) - start_time).hours) # 1
print((end_time - start_time.in_tz(tz)).hours) # 1
print((end_time.in_tz(tz) - start_time.in_tz(tz)).hours) # 2 ?!
p1 = end_time - start_time
print(start_time + p1 == end_time) # True
print(p1.total_hours()) # 1.0
p2 = end_time.in_tz(tz) - start_time.in_tz(tz)
print(start_time + p2 == end_time) # False
print(p2.total_hours()) # 1.0
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 1
- Comments: 17 (12 by maintainers)
@pganssle
The idea is we should always have:
which Pendulum actually does right when adding/subtracting units of variable length (like days):
but not for other units:
This is more or less what this issue is about.
And the thing is the standard library does not get this right:
@pganssle
And you can, I left the standard API methods and properties alone when subclassing the
datetime
class to avoid unexpected suprises. If this is not the case, it’s a bug and you can file an issue.And don’t get me wrong here, I understand your point of view. But you have to admit that’s it’s far from trivial to handle DST transitions with the standard library.
I mean when I get
gettz('America/Los_Angeles')
it’s because I want to work in this particular timezone and as such I also want it to respect the DST transitions that occur which the standard library does not.This is especially true in Python 3.6 and the introduction of the
fold
attribute. This attribute has been introduced to disambiguate repeated times and yet we have this inconsistency:As you can see, it should be
1:00:00
.And, to me, the main problem is that Python is somewhat the exception regarding datetime with timezone arithmetic compared to other languages which can be really confusing for newcomers (and even for seasoned developers):
In PHP:
In Ruby:
In Go:
it will output:
And I think this is the behavior than most people expect when working in a timezone because that’s the one that makes most sense.
This why I started Pendulum because neither the standard library nor the other datetime libraries out there handle it this way.
@dekoza Yes, you’re right,
precise_diff()
was the culprit 😃If both datetimes were in the same timezone, it would not use the offsets to get the proper time difference. But this is only true for variable-length units (year, month, day).
Commit 5b06241 should fix the issue.
It will land in the next bugfix release.
@sdispater It’s not about wrong or right here, it’s a choice between two equally valid conceptions of the semantics. If you say “add 3 hours to midnight”, you get 3AM because you’ve advanced the clock 3 hours. If you say “start at midnight and jump to the time after which 3 real-time hours have elapsed”, you will end up at 4AM, not 3AM in this case.
The reason I bring up the “1 day” case is that there’s no “right” intuition that covers all the cases, because generally on the scale of 1 day you tend to be doing operations where you want wall clock offsets, whereas offsets of around 1-2 hours, you tend to be thinking about elapsed time so your intuition tends to prefer absolute elapsed time.
One major problem in designing an API around this is that I can’t think of a way that would be “obvious” here other than simply disallowing arithmetical operations entirely since they are overloaded and encoding the semantics you want into the operator itself, but that would be cumbersome and that ship has sailed long ago.
If you want
pendulum
to continue to be a (drop in)datetime
replacement, I think the right thing to do is to maintain the standard library’s conventions for the semantics and (as it seems you’ve done with thein_days
andin_hours
functions), define extensions that provide a more obvious and intuitive way to be explicit about the semantics you care about.@pganssle
My mistake I thought
gettz()
returned the sametzinfo
object.But still, your examples seem counter-intuitive to me:
This one is actually right but you lose the information that it’s also 23 hours. I know you can get the 23 hours by going trough UTC but that’s far from being intuitive.
This is actually wrong, only 2 hours have passed in the
America/Los_Angeles
timezone not 3.And this is where I wanted Pendulum to be right:
I don’t think there’s anything wrong here. If you use
total_hours()
you’ll get the behavior you expect.The problem is that in the
America/Los_Angeles
time zone, there are 2 wall hours between those two UTC times. The first example you’re subtracting two times both in UTC, so there’s both 1 wall hour and 1 actual hour.In the next two examples, you have “mixed zone” subtractions, so “wall time” doesn’t make a lot of sense, so it’s very intuitive to me that if you allow these operations to exist, you’d return the total elapsed amount of time between them.
In the final example, both are in the same “wall time” and you can actually see that one of them is
3:00
and the other one is01:00
, so there are 2 “wall hours” between them (even though only one actually existed).This is somewhat counter-intuitive on short timescales, but I find these dynamics much less counter-intuitive on the time scale of days and hours. For example noon the day before a transition is “1 day” before noon the day after a transition, even though the actual elapsed time is either 23 or 25 hours.
I think the “same zone” and “between zone” dynamics are similar to what I wrote about in this blog post (though in that case I think the result is somewhat absurd, whereas in this case I understand why it makes sense).
Thanks for reporting this!
Yes, there is definitely something wrong here.
I will take a look and get back to you.