Skip to content

Support POSIX TZ environment variable strings #15792

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

HertzDevil
Copy link
Contributor

@HertzDevil HertzDevil commented May 18, 2025

This PR adds support for POSIX TZ strings such as EST5EDT,M3.2.0,M11.1.0, as defined in POSIX.1-2024, Section 8.3. When the TZ environment variable contains such a string, Time::Location.load_local, and by extension .local, will return a location with the correct transition datetimes across the entire proleptic Gregorian calendar.

A convenience method Time::Location.posix_tz is also added for creating those locations without going through an environment variable.

The implementation-defined aspects of the standard are not supported, and can be added later at will. These include strings prefixed by a colon (:/etc/localtime), and TZ strings specifying a DST time zone without transition rules (EST5EDT, although the US ones are available in most time zone databases by those names too).

Time::Location::Zone's allowed offset range is slightly extended to accommodate for POSIX TZ strings.

The work here can be followed up in at least two ways:

  • TZif database files, as defined in IETF RFC 9636, also contain POSIX TZ strings in their footers, used to compute time zone transitions beyond the last manually defined transition time. Crystal does not read these at the moment, which means transitions after year 2038 do not work, since most tzdata packages only include entries fitting into 32-bit Unix times, even those using TZif version 2 or above. This would depend also on time/unix: Crystal fails to parse slim zoneinfo files #11907 as the footer is located after the 64-bit Unix time body.
  • Windows system time zones in Crystal are defined by somewhat arbitrarily instantiating all individual transitions 100 years within the current year, but they use month + week + day of week under the hood, so they are actually compatible with POSIX TZ strings. Internally using Time::TZ from this PR instead would greatly reduce the memory consumption of Time::Location objects on Windows.

Comment on lines 218 to 223
# Local time transition rules derived from a POSIX TZ string, used for times
# past the last defined transition.
@tz : TZ?

# The original TZ string that produced `@tz`, if any.
protected getter tz_string : String?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: TZ is a pretty large struct. Inlining it increases the size of Time::Location significantly from 64 bytes to 152 bytes.
Considering that POSIX TZ strings are usually a quite rare occurrence, we might want to try to reduce the memory impact on all other location instances.
We could consider allocating TZ separately and storing only a pointer (or turn it into a class).
As an alternative, subclassing Time::Location could also work. We could keep the inlined struct and the memory layout of regular Time::Location instances wouldn't change.
Another option could be to reduce the size of TZ instances. For example, MonthWeekDay could easily fit into 3 bytes instead of 12.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My MSYS2 installation has 1759 TZif2 database files and 36 TZif3 files, whereas my WSL installation has 479 and 8 respectively. All of them have a POSIX TZ string in the footer, so I wouldn't call them rare.

In both cases there are only 94 unique strings though, 63 of which have no transitions. We could use a global Hash(String, Time::TZ) cache that maps strings to their parsed instances, at the expense of even more indirection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Time::TZ and Time::Location are now 32 bytes smaller

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, we're not using POSIX TZ anywhere (because it's not implemented). And with this implementation, I don't think it would be used much either (depends on runtime config).
That might change if we implement the suggested enhancement to include the POSIX TZ transitions in locations loaded from tzdb.

But these use cases are not that significant. Time::Location instances loaded from tzdb are cached, so even when all available locations are loaded, the total overhead is probably negligible.

Locations with fixed offsets, however, are not cached. Every time you create a time instance with a fixed offset, it instantiates a new Time::Location. This can add up easily if there is lots of time deserialization, for example.
Perhaps we could reconsider this and cache fixed offsets as well. 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone decides to create a Time::Location for every single fixed offset then there would be around 180,000 location objects in the cache with no obvious way to evict them. In theory the same could happen with the POSIX TZ string cache I suggested. Maybe WeakRef resolves this, but I only found something like that in Log::Builder#@logs.

It looks like non-abstract inheritance is indeed a viable solution here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would even be more possibilities if you account for different names for these zones.

But I'm pretty sure it would be sufficient to cache some common offsets. That would be primarily the full and half hours, maybe quarter hour offsets. Only with default names. That should cover the vast majority of use cases sufficiently.

All exact 15 minute offsets between -14 and +14 hours should be quite manageable and work well with linear lookup.

@straight-shoota straight-shoota added this to the 1.17.0 milestone May 26, 2025
@straight-shoota straight-shoota merged commit d733333 into crystal-lang:master May 29, 2025
36 checks passed
@HertzDevil HertzDevil deleted the feature/posix-tz-string branch May 29, 2025 13:47
straight-shoota pushed a commit that referenced this pull request May 29, 2025
This points `ZONEINFO` to `spec/std/data/zoneinfo.zip` and uses the new version built in #15792, meaning the script now works consistently across all platforms and does not depend on any system definitions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants