Skip to content

Locale-invariant parsing and formatting #251

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 105 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
105 commits
Select commit Hold shift + click to select a range
e517b81
WIP: first draft of locale-invariant parsing and formatting
dkhalanskyjb Feb 26, 2023
9bb6ca8
Refactor: reify the string builder directives
dkhalanskyjb Mar 15, 2023
324fd58
Refactoring: move the parser-combining logic to `Parser`
dkhalanskyjb Mar 20, 2023
99a05a2
Fix parsing of years exceeding 4 digits
dkhalanskyjb Mar 21, 2023
93391c2
Add some tests for common formats
dkhalanskyjb Mar 21, 2023
aab6f23
Fix UtcOffset parsing
dkhalanskyjb Mar 22, 2023
273031f
Fix using complex string formats on Kotlin/Native
dkhalanskyjb Mar 22, 2023
2cb10c4
Support LocalDateFormat.appendMonthName
dkhalanskyjb Mar 27, 2023
d78fdb1
Implement the Unicode format strings
dkhalanskyjb Mar 30, 2023
ab60f61
Implement 12-hour clock formats
dkhalanskyjb Apr 13, 2023
6df67fc
Implement == for ValueBag
dkhalanskyjb Apr 13, 2023
444d310
Implement formats with day of the week
dkhalanskyjb Apr 13, 2023
0c5f27d
Implement the RFC 1123 format
dkhalanskyjb Apr 13, 2023
cc35b13
Redo the model of numeric signs to support multi-field values
dkhalanskyjb Apr 14, 2023
7186e33
Work around some JS bug
dkhalanskyjb Apr 14, 2023
35d867f
Implement pretty-printing for formatters
dkhalanskyjb Apr 25, 2023
aa76138
Rework the parser infrastructure for performance a bit
dkhalanskyjb Apr 25, 2023
ddb90aa
Introduce a common interface for date/time formats
dkhalanskyjb Apr 27, 2023
9f44bc7
Support `find` and `findAll` on formats
dkhalanskyjb Apr 27, 2023
2225bf2
Make "-" a non-special character in format strings
dkhalanskyjb Apr 27, 2023
fe1f03e
Refactor: gather all code for string representation of formatters
dkhalanskyjb May 8, 2023
92c966f
Small refactoring
dkhalanskyjb May 17, 2023
c058b81
Introduce changes due to the first round of reviews
dkhalanskyjb Jun 9, 2023
d5ed029
Create entry points for formatting API like `LocalDate.Format.build`
dkhalanskyjb Jun 9, 2023
dc2098d
Implement upper bounds for ValueBag values
dkhalanskyjb Jul 11, 2023
dee428a
Remove the no-longer-needed code for the LRU cache
dkhalanskyjb Jul 11, 2023
7c3096d
Remove the find and findAll functions
dkhalanskyjb Jul 11, 2023
1158788
Add formatTo and parseOrNull
dkhalanskyjb Jul 11, 2023
1161367
Add tests for assigning out-of-bounds values to ValueBag
dkhalanskyjb Jul 11, 2023
7711596
Add tests for error handling on parsing
dkhalanskyjb Jul 11, 2023
51e0c8a
Implement month and day-of-week name classes
dkhalanskyjb Jul 11, 2023
0a46e36
Implement formats for two-digit year
dkhalanskyjb Jul 12, 2023
c05e34b
Reimplement padding as space/zero/none
dkhalanskyjb Jul 14, 2023
2e23dab
Make the UTC offset hour contain the sign
dkhalanskyjb Jul 14, 2023
a9cd48b
*.Format.build { } -> *.Format { }
dkhalanskyjb Jul 14, 2023
1849bb7
Disable configuring whether the year is output with a sign on padding…
dkhalanskyjb Jul 14, 2023
49a277a
Refactoring to clear up the internal models
dkhalanskyjb Jul 14, 2023
1b96711
Replace a single 'appendAlternatives' with clearer, orthogonal 'alter…
dkhalanskyjb Jul 15, 2023
3450c85
Address the review
dkhalanskyjb Jul 24, 2023
6cd875b
Work around a bug
dkhalanskyjb Jul 24, 2023
37ad875
Implement predefined constants for popular formats
dkhalanskyjb Jul 26, 2023
a845d81
Clarify the semantics of formatting fractional values
dkhalanskyjb Jul 26, 2023
b0a076e
Make all the fields in a ValueBag independent from one another
dkhalanskyjb Jul 27, 2023
86d912a
Implement appending whole formats in builders
dkhalanskyjb Jul 27, 2023
d049afc
Document everything
dkhalanskyjb Jul 28, 2023
a73e4aa
Add tests for timezone ID parsing and formatting
dkhalanskyjb Jul 28, 2023
2529d49
Fix the code representation of the UTC offset hour
dkhalanskyjb Jul 28, 2023
f78a90d
Add a missing @SharedImmutable
dkhalanskyjb Jul 28, 2023
ac5d388
Typo: toLocaldate -> toLocalDate
dkhalanskyjb Sep 8, 2023
cd0841d
First stage of renaming
dkhalanskyjb Sep 8, 2023
81a660f
Reorganize the entry points for the formatting API
dkhalanskyjb Sep 8, 2023
4185fe8
Refactoring: remove a redundant class
dkhalanskyjb Sep 11, 2023
e9ec023
Second stage of renaming
dkhalanskyjb Sep 18, 2023
3b573ea
Fixups
dkhalanskyjb Sep 20, 2023
5d38ead
Remove a redundant annotation
dkhalanskyjb Oct 17, 2023
f3a9980
appendLiteral -> chars, char
dkhalanskyjb Oct 24, 2023
f751b9f
Add an OptIn for appendUnicodeFormatString
dkhalanskyjb Oct 24, 2023
905b453
appendOptional -> optional
dkhalanskyjb Oct 24, 2023
ae1d624
Rename the parameters of alternativeParsing
dkhalanskyjb Oct 24, 2023
97e8ff9
Reorganize the hierarchy of builders
dkhalanskyjb Oct 24, 2023
d8a1396
ValueBag -> DateTimeComponents
dkhalanskyjb Oct 24, 2023
1ec4777
Implement formatAsKotlinBuilderDsl properly
dkhalanskyjb Oct 24, 2023
edad7f4
Print seconds in ISO constants for time and datetime
dkhalanskyjb Oct 24, 2023
19a03a8
Rename DateTimeComponents functions
dkhalanskyjb Oct 24, 2023
1a9ec34
Work around a segfault
dkhalanskyjb Oct 24, 2023
0a0cf6c
Performance improvements
dkhalanskyjb Nov 9, 2023
da45897
Introduce benchmarks module with basic formatting benchmark
qwwdfsad Nov 9, 2023
acbb44d
Add README.md for benchmarks
qwwdfsad Nov 9, 2023
603a483
Rename the remaining functions
dkhalanskyjb Nov 14, 2023
288e976
Ensure that `optional` sets the fields to their default values
dkhalanskyjb Nov 14, 2023
c9c609d
Rework parsing to ensure a specific traversal order
dkhalanskyjb Nov 15, 2023
90c9d37
Avoid copying in branches that never get entered
dkhalanskyjb Nov 15, 2023
1d391f8
Hide the functionality to add extra zeros to the second's fraction
dkhalanskyjb Nov 16, 2023
f269b7a
Refactor the builder code
dkhalanskyjb Nov 16, 2023
21a511d
Refactor
dkhalanskyjb Nov 16, 2023
59fd382
Add tests for the Unicode patterns
dkhalanskyjb Nov 16, 2023
e5ff04e
Remove an unnecessary intermediate class
dkhalanskyjb Nov 16, 2023
4047c07
Update the docs for byUnicodePattern
dkhalanskyjb Nov 16, 2023
528aafe
Improve the error messages for incompatible Unicode directives
dkhalanskyjb Nov 16, 2023
7b73212
Ensure that the value-is-reassigned error is not an exception
dkhalanskyjb Nov 17, 2023
40542f5
Final touches to the docs
dkhalanskyjb Nov 17, 2023
0af8ed5
Test that 60 seconds and 24 hours are not parsed for Instant
dkhalanskyjb Nov 29, 2023
f64f120
Adapt the code to the new Kotlin version
dkhalanskyjb Dec 5, 2023
6bed380
Refactor a bit
dkhalanskyjb Dec 5, 2023
d6301ca
Remove the parser from Native
dkhalanskyjb Dec 5, 2023
5bd918f
Add diagnostic messages
dkhalanskyjb Dec 5, 2023
84e7d81
Support string constants that begin or end with numbers
dkhalanskyjb Dec 5, 2023
122dd9c
Address the review
dkhalanskyjb Dec 18, 2023
db723fa
Address more review points
dkhalanskyjb Dec 18, 2023
d552935
Do not add trailing zeros to the fractions of seconds in standard for…
dkhalanskyjb Dec 19, 2023
a13e228
Mention the overloads of secondFraction() in the docs
dkhalanskyjb Jan 15, 2024
24d8ae8
Don't support DecimalFraction.hashCode
dkhalanskyjb Jan 15, 2024
3f5095b
Convert `y` to `u` automatically, but emit a comment about it
dkhalanskyjb Jan 18, 2024
853ddc7
Mention the rounding mode in `secondFraction` docs
dkhalanskyjb Jan 18, 2024
1ce9d14
Check that we can replicate java.time.Instant.parse
dkhalanskyjb Jan 18, 2024
b5236ed
Update README to include parsing and formatting
dkhalanskyjb Jan 26, 2024
f07c80e
Remove String.toSomething
dkhalanskyjb Jan 26, 2024
70fb30c
Only allow the ISO extended format in `UtcOffset.parse`
dkhalanskyjb Feb 1, 2024
7f73ac2
Use a single `parse` overload with a default parameter
dkhalanskyjb Feb 1, 2024
a6350ac
Fix a bug in byUnicodePattern that made optional sections mandatory
dkhalanskyjb Feb 14, 2024
071867d
Fix how `formatAsKotlinBuilderDsl` formats pre-defined formats.
dkhalanskyjb Feb 14, 2024
7d2a8fc
Add byUnicodePattern instructions to the README
dkhalanskyjb Feb 14, 2024
e270480
Change 'secondFraction' to truncate instead of rounding
dkhalanskyjb Feb 19, 2024
825ccde
Make ISO_DATE_TIME_OFFSET more consistent with other ISO formats, doc…
dkhalanskyjb Feb 19, 2024
e0598c1
Refactor a test
dkhalanskyjb Feb 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,5 @@
*.iml
target
build
/local.properties
/local.properties
benchmarks.jar
135 changes: 124 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -172,34 +172,145 @@ val hourMinute = LocalTime(hour = 12, minute = 13)
An `Instant` can be converted to a number of milliseconds since the Unix/POSIX epoch with the `toEpochMilliseconds()` function.
To convert back, use the companion object function `Instant.fromEpochMilliseconds(Long)`.

### Converting instant and local date/time to and from string
### Converting instant and local date/time to and from the ISO 8601 string

Currently, `Instant`, `LocalDateTime`, `LocalDate` and `LocalTime` only support ISO-8601 format.
`Instant`, `LocalDateTime`, `LocalDate` and `LocalTime` provide shortcuts for
parsing and formatting them using the extended ISO-8601 format.
The `toString()` function is used to convert the value to a string in that format, and
the `parse` function in companion object is used to parse a string representation back.


```kotlin
val instantNow = Clock.System.now()
instantNow.toString() // returns something like 2015-12-31T12:30:00Z
val instantBefore = Instant.parse("2010-06-01T22:19:44.475Z")
```

Alternatively, the `String.to...()` extension functions can be used instead of `parse`,
where it feels more convenient:

`LocalDateTime` uses a similar format, but without `Z` UTC time zone designator in the end.

`LocalDate` uses a format with just year, month, and date components, e.g. `2010-06-01`.

`LocalTime` uses a format with just hour, minute, second and (if non-zero) nanosecond components, e.g. `12:01:03`.

```kotlin
"2010-06-01T22:19:44.475Z".toInstant()
"2010-06-01T22:19:44".toLocalDateTime()
"2010-06-01".toLocalDate()
"12:01:03".toLocalTime()
"12:0:03.999".toLocalTime()
LocalDateTime.parse("2010-06-01T22:19:44")
LocalDate.parse("2010-06-01")
LocalTime.parse("12:01:03")
LocalTime.parse("12:00:03.999")
LocalTime.parse("12:0:03.999") // fails with an IllegalArgumentException
```

### Working with other string formats

When some data needs to be formatted in some format other than ISO-8601, one
can define their own format or use some of the predefined ones:

```kotlin
// import kotlinx.datetime.format.*

val dateFormat = LocalDate.Format {
monthNumber(padding = Padding.SPACE)
char('/')
dayOfMonth()
char(' ')
year()
}

val date = dateFormat.parse("12/24 2023")
println(date.format(LocalDate.Formats.ISO_BASIC)) // "20231224"
```

#### Using Unicode format strings (like `yyyy-MM-dd`)

Given a constant format string like the ones used by Java's
[DateTimeFormatter.ofPattern](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html) can be
converted to Kotlin code using the following invocation:

```kotlin
// import kotlinx.datetime.format.*

println(DateTimeFormat.formatAsKotlinBuilderDsl(DateTimeComponents.Format {
byUnicodePattern("uuuu-MM-dd'T'HH:mm:ss[.SSS]Z")
}))

// will print:
/*
date(LocalDate.Formats.ISO)
char('T')
hour()
char(':')
minute()
char(':')
second()
alternativeParsing({
}) {
char('.')
secondFraction(3)
}
offset(UtcOffset.Formats.FOUR_DIGITS)
*/
```

When your format string is not constant, with the `FormatStringsInDatetimeFormats` opt-in,
you can use the format without converting it to Kotlin code beforehand:

```kotlin
val formatPattern = "yyyy-MM-dd'T'HH:mm:ss[.SSS]"

@OptIn(FormatStringsInDatetimeFormats::class)
val dateTimeFormat = LocalDateTime.Format {
byUnicodePattern(formatPattern)
}

dateTimeFormat.parse("2023-12-24T23:59:59")
```

### Parsing and formatting partial, compound or out-of-bounds data

Sometimes, the required string format doesn't fully correspond to any of the
classes `kotlinx-datetime` provides. In these cases, `DateTimeComponents`, a
collection of all date-time fields, can be used instead.

```kotlin
// import kotlinx.datetime.format.*

val yearMonth = DateTimeComponents.Format { year(); char('-'); monthNumber() }
.parse("2024-01")
println(yearMonth.year)
println(yearMonth.monthNumber)

val dateTimeOffset = DateTimeComponents.Formats.ISO_DATE_TIME_OFFSET
.parse("2023-01-07T23:16:15.53+02:00")
println(dateTimeOffset.toUtcOffset()) // +02:00
println(dateTimeOffset.toLocalDateTime()) // 2023-01-07T23:16:15.53
```

Occasionally, one can encounter strings where the values are slightly off:
for example, `23:59:60`, where `60` is an invalid value for the second.
`DateTimeComponents` allows parsing such values as well and then mutating them
before conversion.

```kotlin
val time = DateTimeComponents.Format { time(LocalTime.Formats.ISO) }
.parse("23:59:60").apply {
if (second == 60) second = 59
}.toLocalTime()
println(time) // 23:59:59
```

Because `DateTimeComponents` is provided specifically for parsing and
formatting, there is no way to construct it normally. If one needs to format
partial, complex or out-of-bounds data, the `format` function allows building
`DateTimeComponents` specifically for formatting it:

```kotlin
DateTimeComponents.Formats.RFC_1123.format {
// the receiver of this lambda is DateTimeComponents
setDate(LocalDate(2023, 1, 7))
hour = 23
minute = 59
second = 60
setOffset(UtcOffset(hours = 2))
} // Sat, 7 Jan 2023 23:59:60 +0200
```

### Instant arithmetic
Expand Down Expand Up @@ -388,3 +499,5 @@ For local builds, you can use a later version of JDK if you don't have that
version installed. Specify the version of this JDK with the `java.mainToolchainVersion` Gradle property.

After that, the project can be opened in IDEA and built with Gradle.

For building and running benchmarks, see [README.md](benchmarks/README.md)
28 changes: 28 additions & 0 deletions benchmarks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
### Benchmarks utility module

Module that provides benchmarking infrastructure for kotlinx-datetime.
Please note that these benchmarks are typically written with the specific target, hypothesis and effect in mind.

They provide numbers, not insights, and shouldn't be used as the generic comparison and statements like
"X implementaiton or format is faster/slower than Y"


#### Usage

```
// Build `benchmarks.jar` into the project's root
./gradlew :benchmarks:jmhJar

// Run all benchmarks
java -jar benchmarks.jar

// Run dedicated benchmark(s)
java -jar benchmarks.jar Formatting
java -jar benchmarks.jar FormattingBenchmark.formatIso

// Run with the specified number of warmup iterations, measurement iterations, timeunit and mode
java -jar benchmarks.jar -wi 5 -i 5 -tu us -bm thrpt Formatting

// Extensive help
java -jar benchmarks.jar -help
```
34 changes: 34 additions & 0 deletions benchmarks/build.gradle.kts
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
/*
* Copyright 2019-2023 JetBrains s.r.o. and contributors.
* Use of this source code is governed by the Apache 2.0 License that can be found in the LICENSE.txt file.
*/

plugins {
id("kotlin")
id("me.champeau.jmh")
}


val mainJavaToolchainVersion by ext(project.property("java.mainToolchainVersion"))
val modularJavaToolchainVersion by ext(project.property("java.modularToolchainVersion"))

sourceSets {
dependencies {
implementation(project(":kotlinx-datetime"))
implementation("org.openjdk.jmh:jmh-core:1.35")
}
}

// Publish benchmarks to the root for the easier 'java -jar benchmarks.jar`
tasks.named<Jar>("jmhJar") {
val nullString: String? = null
archiveBaseName.set("benchmarks")
archiveClassifier.set(nullString)
archiveVersion.set(nullString)
archiveVersion.convention(nullString)
destinationDirectory.set(file("$rootDir"))
}

repositories {
mavenCentral()
}
27 changes: 27 additions & 0 deletions benchmarks/src/jmh/kotlin/FormattingBenchmark.kt
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
/*
* Copyright 2019-2023 JetBrains s.r.o. and contributors.
* Use of this source code is governed by the Apache 2.0 License that can be found in the LICENSE.txt file.
*/

package kotlinx.datetime

import org.openjdk.jmh.annotations.*
import java.util.concurrent.*

@Warmup(iterations = 5, time = 1)
@Measurement(iterations = 5, time = 1)
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@State(Scope.Benchmark)
@Fork(1)
open class FormattingBenchmark {

private val localDateTime = LocalDateTime(2023, 11, 9, 12, 21, 31, 41)
private val formatted = LocalDateTime.Formats.ISO.format(localDateTime)

@Benchmark
fun formatIso() = LocalDateTime.Formats.ISO.format(localDateTime)

@Benchmark
fun parseIso() = LocalDateTime.Formats.ISO.parse(formatted)
Comment on lines +22 to +26
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it useful to commit the benchmark set in this state?
Also, I'd prefer if we dogfood kotlinx-benchmark here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do run them occasionally.

kotlinx-benchmark

Maybe we could do that, but I think it's a problem for after we've published a release with this PR.

Copy link

@lppedd lppedd Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just know that kotlinx-benchmark is not ready outside of JVM. Too many issues still, especially in JS.
But I'd be happy to see it adopted here, it may speed up the resolution of such issues.

}
20 changes: 4 additions & 16 deletions core/common/src/DateTimePeriod.kt
Original file line number Diff line number Diff line change
Expand Up @@ -298,14 +298,9 @@ public sealed class DateTimePeriod {
}

/**
* Parses the ISO-8601 duration representation as a [DateTimePeriod].
*
* See [DateTimePeriod.parse] for examples.
*
* @throws IllegalArgumentException if the text cannot be parsed or the boundaries of [DateTimePeriod] are exceeded.
*
* @see DateTimePeriod.parse
* @suppress
*/
@Deprecated("Removed to support more idiomatic code. See https://github.com/Kotlin/kotlinx-datetime/issues/339", ReplaceWith("DateTimePeriod.parse(this)"), DeprecationLevel.WARNING)
public fun String.toDateTimePeriod(): DateTimePeriod = DateTimePeriod.parse(this)

/**
Expand Down Expand Up @@ -358,16 +353,9 @@ public class DatePeriod internal constructor(
}

/**
* Parses the ISO-8601 duration representation as a [DatePeriod].
*
* This function is equivalent to [DateTimePeriod.parse], but will fail if any of the time components are not
* zero.
*
* @throws IllegalArgumentException if the text cannot be parsed, the boundaries of [DatePeriod] are exceeded,
* or any time components are not zero.
*
* @see DateTimePeriod.parse
* @suppress
*/
@Deprecated("Removed to support more idiomatic code. See https://github.com/Kotlin/kotlinx-datetime/issues/339", ReplaceWith("DatePeriod.parse(this)"), DeprecationLevel.WARNING)
public fun String.toDatePeriod(): DatePeriod = DatePeriod.parse(this)

private class DateTimePeriodImpl(
Expand Down
Loading