-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Support POSIX TZ environment variable strings #15792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support POSIX TZ environment variable strings #15792
Conversation
src/time/location.cr
Outdated
# Local time transition rules derived from a POSIX TZ string, used for times | ||
# past the last defined transition. | ||
@tz : TZ? | ||
|
||
# The original TZ string that produced `@tz`, if any. | ||
protected getter tz_string : String? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: TZ
is a pretty large struct. Inlining it increases the size of Time::Location
significantly from 64 bytes to 152 bytes.
Considering that POSIX TZ strings are usually a quite rare occurrence, we might want to try to reduce the memory impact on all other location instances.
We could consider allocating TZ
separately and storing only a pointer (or turn it into a class).
As an alternative, subclassing Time::Location
could also work. We could keep the inlined struct and the memory layout of regular Time::Location
instances wouldn't change.
Another option could be to reduce the size of TZ
instances. For example, MonthWeekDay
could easily fit into 3 bytes instead of 12.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My MSYS2 installation has 1759 TZif2 database files and 36 TZif3 files, whereas my WSL installation has 479 and 8 respectively. All of them have a POSIX TZ string in the footer, so I wouldn't call them rare.
In both cases there are only 94 unique strings though, 63 of which have no transitions. We could use a global Hash(String, Time::TZ)
cache that maps strings to their parsed instances, at the expense of even more indirection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Time::TZ
and Time::Location
are now 32 bytes smaller
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, we're not using POSIX TZ anywhere (because it's not implemented). And with this implementation, I don't think it would be used much either (depends on runtime config).
That might change if we implement the suggested enhancement to include the POSIX TZ transitions in locations loaded from tzdb.
But these use cases are not that significant. Time::Location
instances loaded from tzdb are cached, so even when all available locations are loaded, the total overhead is probably negligible.
Locations with fixed offsets, however, are not cached. Every time you create a time instance with a fixed offset, it instantiates a new Time::Location
. This can add up easily if there is lots of time deserialization, for example.
Perhaps we could reconsider this and cache fixed offsets as well. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If someone decides to create a Time::Location
for every single fixed offset then there would be around 180,000 location objects in the cache with no obvious way to evict them. In theory the same could happen with the POSIX TZ string cache I suggested. Maybe WeakRef
resolves this, but I only found something like that in Log::Builder#@logs
.
It looks like non-abstract inheritance is indeed a viable solution here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would even be more possibilities if you account for different names for these zones.
But I'm pretty sure it would be sufficient to cache some common offsets. That would be primarily the full and half hours, maybe quarter hour offsets. Only with default names. That should cover the vast majority of use cases sufficiently.
All exact 15 minute offsets between -14 and +14 hours should be quite manageable and work well with linear lookup.
This points `ZONEINFO` to `spec/std/data/zoneinfo.zip` and uses the new version built in #15792, meaning the script now works consistently across all platforms and does not depend on any system definitions.
This PR adds support for POSIX TZ strings such as
EST5EDT,M3.2.0,M11.1.0
, as defined in POSIX.1-2024, Section 8.3. When theTZ
environment variable contains such a string,Time::Location.load_local
, and by extension.local
, will return a location with the correct transition datetimes across the entire proleptic Gregorian calendar.A convenience method
Time::Location.posix_tz
is also added for creating those locations without going through an environment variable.The implementation-defined aspects of the standard are not supported, and can be added later at will. These include strings prefixed by a colon (
:/etc/localtime
), and TZ strings specifying a DST time zone without transition rules (EST5EDT
, although the US ones are available in most time zone databases by those names too).Time::Location::Zone
's allowed offset range is slightly extended to accommodate for POSIX TZ strings.The work here can be followed up in at least two ways:
Time::TZ
from this PR instead would greatly reduce the memory consumption ofTime::Location
objects on Windows.