fix assert violation found in #40 by canonicalizing to closed-closed intervals #41

ericphanson · 2025-02-28T17:35:14Z

closes #40

with the new tests, before 1c3a7db :

     Testing Running tests...
index_and_error_from_time: Test Failed at /Users/eph/AlignedSpans.jl/test/time_index_conversions.jl:45
  Expression: index == (AlignedSpans.stop_index_from_time(rate, TimeSpan(0, sample_time + Nanosecond(1)), RoundDown))[1]
   Evaluated: 181 == 180

Stacktrace:
 [1] macro expansion
   @ ~/.julia/juliaup/julia-1.11.3+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/Test/src/Test.jl:679 [inlined]
 [2] macro expansion
   @ ~/AlignedSpans.jl/test/time_index_conversions.jl:45 [inlined]
 [3] macro expansion
   @ ~/.julia/juliaup/julia-1.11.3+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/Test/src/Test.jl:1704 [inlined]
 [4] top-level scope
   @ ~/AlignedSpans.jl/test/time_index_conversions.jl:24
index_and_error_from_time: Error During Test at /Users/eph/AlignedSpans.jl/test/time_index_conversions.jl:45
  Test threw exception
  Expression: index == (AlignedSpans.stop_index_from_time(rate, TimeSpan(0, sample_time + Nanosecond(1)), RoundDown))[1]
  [AlignedSpans] Unexpected error in `stop_index_from_time`:
  
  - `time_from_index(sample_rate, last_index) = 1000000001 nanoseconds`
  - `last(interval) = 1000000000 nanoseconds`
  - Expected `time_from_index(sample_rate, last_index) <= last(interval)`
  
  Please file an issue on AlignedSpans.jl: https://github.com/beacon-biosignals/AlignedSpans.jl/issues/new
  
  Stacktrace:
   [1] error(s::String)
     @ Base ./error.jl:35
   [2] stop_index_from_time(sample_rate::Int64, interval::Interval{Nanosecond, Closed, Open}, mode::RoundingMode{:Down})
     @ AlignedSpans ~/AlignedSpans.jl/src/interop.jl:66
   [3] stop_index_from_time(sample_rate::Int64, span::TimeSpan, mode::RoundingMode{:Down})
     @ AlignedSpans ~/AlignedSpans.jl/src/interop.jl:142
   [4] macro expansion
     @ ~/.julia/juliaup/julia-1.11.3+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/Test/src/Test.jl:676 [inlined]
   [5] macro expansion
     @ ~/AlignedSpans.jl/test/time_index_conversions.jl:45 [inlined]
   [6] macro expansion
     @ ~/.julia/juliaup/julia-1.11.3+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/Test/src/Test.jl:1704 [inlined]
   [7] top-level scope
     @ ~/AlignedSpans.jl/test/time_index_conversions.jl:24
(aligned, sample_rate, dur) = (AlignedSpan(1.0, 1, 100), 1, Second(1))
(aligned, sample_rate, dur) = (AlignedSpan(1.0, 1, 100), 1, Second(9))
(aligned, sample_rate, dur) = (AlignedSpan(1.0, 1, 100), 1, Millisecond(2500))
(aligned, sample_rate, dur) = (AlignedSpan(1.0, 1, 100), 1, Millisecond(1111))
Test Summary:                        | Pass  Fail  Error  Total   Time
AlignedSpans.jl                      | 5984     1      1   5986  13.9s
  Aqua                               |    6                   6   1.6s
  Construction from indices directly |    5                   5   0.1s
  RoundInward                        |   14                  14   0.0s
  RoundSpanDown                      |   11                  11   0.0s
  ConstantSamplesRoundingMode        |  505                 505   0.0s
  StepRange roundtripping            |   60                  60   0.0s
  TimeSpans roundtripping            |   60                  60   0.1s
  Samples indexing                   |   36                  36   0.0s
  JSON3 roundtripping                |    1                   1   1.7s
  Arrow roundtripping                |    1                   1   5.0s
  index_and_error_from_time          |  238     1      1    240   3.2s
  RoundFullyContainedSampleSpans     |    8                   8   0.1s
  time_from_index                    |    2                   2   0.0s
  `n_samples`                        | 3871                3871   0.0s
  consecutive_subspans               | 1154                1154   0.0s
  consecutive_overlapping_subspans   |   11                  11   0.3s
ERROR: LoadError: Some tests did not pass: 5984 passed, 1 failed, 1 errored, 0 broken.
in expression starting at /Users/eph/AlignedSpans.jl/test/runtests.jl:28
ERROR: Package AlignedSpans errored during testing

i.e. we can repro the issue, and found a counting error.

After:

     Testing Running tests...
(aligned, sample_rate, dur) = (AlignedSpan(1.0, 1, 100), 1, Second(1))
(aligned, sample_rate, dur) = (AlignedSpan(1.0, 1, 100), 1, Second(9))
(aligned, sample_rate, dur) = (AlignedSpan(1.0, 1, 100), 1, Millisecond(2500))
(aligned, sample_rate, dur) = (AlignedSpan(1.0, 1, 100), 1, Millisecond(1111))
Test Summary:   | Pass  Total   Time
AlignedSpans.jl | 5986   5986  12.2s
     Testing AlignedSpans tests passed

i.e. we have fixed the issue.

notes:

This revealed we weren't adequately testing stop_index_from_time. Somehow I thought it was being tested in that gnarly loop of tough cases, but we were only testing index_and_error_from_time instead. We should add tests for start_index_from_time too.
here I have removed the sketchy if is_stop_exclusive(interval) && mode == RoundDown && iszero(error) bit which I think was a crude and sometimes incorrect version of canonicalizing to closed-closed intervals. By doing the latter in to_interval, it is much simpler to do all the logic afterwards. However it does mean we need +1ns and -1ns here and there, which obviously can be a source of bugs.
we no longer use the err return from index_and_error_from_time so we could refactor that function, but I don't want to do that in this PR necessarily
this also solves the shadowing issue shown in Unexpected error in stop_index_from_time #40 (bad stacktrace w/ asserts on)
I also added maxlog=100 to try to not spam warnings, and improved the warning/error message to make it easier reproduce if something goes wrong
this needs careful review I think

kleinschmidt

I'm still struggling to understand what's actually going on here. I would have thought that canonicalizing the use of Close-Closed intervals would fix this bug, but it doesn't seem to have done that. Any idea why?

kleinschmidt · 2025-02-28T17:58:56Z

src/interop.jl

-    last_index, error = index_and_error_from_time(sample_rate, last(interval), mode)
-    if is_stop_exclusive(interval) && mode == RoundDown && iszero(error)
-        last_index -= 1
-    end
+    last_index, _ = index_and_error_from_time(sample_rate, last(interval), mode)


want to double check my understanding here.

previously, interval could be left-open. in that case, if the stop of the interval is actually sample aligned, index_and_error_from_time would return the index of the first excluded sample and 0 error.

by canonicalizing interval handling to be only right-closed, we avoid this special case.

the thing I'm still puzzled about is why we then need to add one ns to the end of the interval, but I'll see if I can make sense of that...

kleinschmidt · 2025-02-28T20:30:50Z

src/interop.jl

+        # let's add an check to verify. Note here we add 1ns to make it an open span again, since we've converted to closed
+        if !(t <= last(interval) + Nanosecond(1))


this is the part that I'm stuck on...shouldn't this be strictly less than (or <= if the original check is used)? seems liek +1ns is really the first time that's not in the span...

the new tests DO fail with that more stringent check though, so it's not just a matter of converting intervals to closed-closed...

I'm wondering now whether we should be doing ceil(Int, ...) in time_from_index, since I suspect that's why we're getting 1ns too high in the assert...other tests fail when I switch it to floor though (although interestingly they're all in the "round trip TimeSpans" set, and only a few...)

Expression: as == rt Evaluated: AlignedSpan(128.33, 5, 20) == AlignedSpan(128.33, 4, 20) Stacktrace: [1] macro expansion @ ~/.julia/juliaup/julia-1.10.8+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Test/src/Test.jl:672 [inlined] [2] macro expansion @ ~/work/AlignedSpans.jl/test/interop.jl:57 [inlined] [3] macro expansion @ ~/.julia/juliaup/julia-1.10.8+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined] [4] top-level scope @ ~/work/AlignedSpans.jl/test/interop.jl:37 TimeSpans roundtripping: Test Failed at /Users/dkleinschmidt/work/AlignedSpans.jl/test/interop.jl:57 Expression: as == rt Evaluated: AlignedSpan(128.33, 5, 20) == AlignedSpan(128.33, 4, 19) Stacktrace: [1] macro expansion @ ~/.julia/juliaup/julia-1.10.8+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Test/src/Test.jl:672 [inlined] [2] macro expansion @ ~/work/AlignedSpans.jl/test/interop.jl:57 [inlined] [3] macro expansion @ ~/.julia/juliaup/julia-1.10.8+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined] [4] top-level scope @ ~/work/AlignedSpans.jl/test/interop.jl:37 TimeSpans roundtripping: Test Failed at /Users/dkleinschmidt/work/AlignedSpans.jl/test/interop.jl:57 Expression: as == rt Evaluated: AlignedSpan(128.33, 3, 6) == AlignedSpan(128.33, 2, 6) Stacktrace: [1] macro expansion @ ~/.julia/juliaup/julia-1.10.8+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Test/src/Test.jl:672 [inlined] [2] macro expansion @ ~/work/AlignedSpans.jl/test/interop.jl:57 [inlined] [3] macro expansion @ ~/.julia/juliaup/julia-1.10.8+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined] [4] top-level scope @ ~/work/AlignedSpans.jl/test/interop.jl:37 TimeSpans roundtripping: Test Failed at /Users/dkleinschmidt/work/AlignedSpans.jl/test/interop.jl:57 Expression: as == rt Evaluated: AlignedSpan(128.33, 3, 6) == AlignedSpan(128.33, 2, 5) Stacktrace: [1] macro expansion @ ~/.julia/juliaup/julia-1.10.8+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Test/src/Test.jl:672 [inlined] [2] macro expansion @ ~/work/AlignedSpans.jl/test/interop.jl:57 [inlined] [3] macro expansion @ ~/.julia/juliaup/julia-1.10.8+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined] [4] top-level scope @ ~/work/AlignedSpans.jl/test/interop.jl:37 TimeSpans roundtripping: Test Failed at /Users/dkleinschmidt/work/AlignedSpans.jl/test/interop.jl:57 Expression: as == rt Evaluated: AlignedSpan(128.33, 78, 79) == AlignedSpan(128.33, 77, 79) Stacktrace: [1] macro expansion @ ~/.julia/juliaup/julia-1.10.8+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Test/src/Test.jl:672 [inlined] [2] macro expansion @ ~/work/AlignedSpans.jl/test/interop.jl:57 [inlined] [3] macro expansion @ ~/.julia/juliaup/julia-1.10.8+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined] [4] top-level scope @ ~/work/AlignedSpans.jl/test/interop.jl:37 TimeSpans roundtripping: Test Failed at /Users/dkleinschmidt/work/AlignedSpans.jl/test/interop.jl:57 Expression: as == rt Evaluated: AlignedSpan(128.33, 78, 79) == AlignedSpan(128.33, 77, 78) Stacktrace: [1] macro expansion @ ~/.julia/juliaup/julia-1.10.8+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Test/src/Test.jl:672 [inlined] [2] macro expansion @ ~/work/AlignedSpans.jl/test/interop.jl:57 [inlined] [3] macro expansion @ ~/.julia/juliaup/julia-1.10.8+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined] [4] top-level scope @ ~/work/AlignedSpans.jl/test/interop.jl:37 TimeSpans roundtripping: Test Failed at /Users/dkleinschmidt/work/AlignedSpans.jl/test/interop.jl:57 Expression: as == rt Evaluated: AlignedSpan(128.33, -10, 20) == AlignedSpan(128.33, -11, 20) Stacktrace: [1] macro expansion @ ~/.julia/juliaup/julia-1.10.8+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Test/src/Test.jl:672 [inlined] [2] macro expansion @ ~/work/AlignedSpans.jl/test/interop.jl:57 [inlined] [3] macro expansion @ ~/.julia/juliaup/julia-1.10.8+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined] [4] top-level scope @ ~/work/AlignedSpans.jl/test/interop.jl:37 TimeSpans roundtripping: Test Failed at /Users/dkleinschmidt/work/AlignedSpans.jl/test/interop.jl:57 Expression: as == rt Evaluated: AlignedSpan(128.33, -10, 20) == AlignedSpan(128.33, -11, 19) Stacktrace: [1] macro expansion @ ~/.julia/juliaup/julia-1.10.8+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Test/src/Test.jl:672 [inlined] [2] macro expansion @ ~/work/AlignedSpans.jl/test/interop.jl:57 [inlined] [3] macro expansion @ ~/.julia/juliaup/julia-1.10.8+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined] [4] top-level scope @ ~/work/AlignedSpans.jl/test/interop.jl:37 (aligned, sample_rate, dur) = (AlignedSpan(1.0, 1, 100), 1, Second(1)) (aligned, sample_rate, dur) = (AlignedSpan(1.0, 1, 100), 1, Second(9)) (aligned, sample_rate, dur) = (AlignedSpan(1.0, 1, 100), 1, Millisecond(2500)) (aligned, sample_rate, dur) = (AlignedSpan(1.0, 1, 100), 1, Millisecond(1111)) Test Summary: | Pass Fail Total Time AlignedSpans.jl | 6058 8 6066 10.3s Aqua | 6 6 1.7s Construction from indices directly | 5 5 0.0s RoundInward | 14 14 0.0s RoundSpanDown | 11 11 0.0s ConstantSamplesRoundingMode | 505 505 0.0s StepRange roundtripping | 60 60 0.0s TimeSpans roundtripping | 52 8 60 1.3s Samples indexing | 36 36 0.0s JSON3 roundtripping | 1 1 1.0s Arrow roundtripping | 1 1 3.7s index_and_error_from_time | 320 320 0.8s RoundFullyContainedSampleSpans | 8 8 0.0s time_from_index | 2 2 0.0s `n_samples` | 3871 3871 0.0s consecutive_subspans | 1154 1154 0.0s consecutive_overlapping_subspans | 11 11 0.2s

this is however consistent with how TimeSpans does the conversion...

kleinschmidt · 2025-02-28T21:25:17Z

I think the underlying problem here is that 180Hz leads to some nasty floating point business:

julia> sample_rate = 180
180

julia> sample_time = Nanosecond(1000000000)
1000000000 nanoseconds

julia> mode = RoundDown
RoundingMode{:Down}()

# float div here
julia> nanoseconds_per_sample(sample_rate)
5.555555555555556e6

julia> sample_index = 181
181

julia> (sample_index - 1) * nanoseconds_per_sample(sample_rate)
1.0000000000000001e9

julia> ceil(Int, (sample_index - 1) * nanoseconds_per_sample(sample_rate))
1000000001

Using rationals for the NS_PER_SEC const could work, but we'd need to consider overflow. We could put in a guard whenever we have say sample_index > typemax(Int64) / 1e9 \approx 9e9 and resort to a float approximation there; at 500 Hz that's like 30 weeks of continuous data; at 44kHz it's ~2 days, so probably pleeeeenty for our forseeable use cases:

julia> Dates.canonicalize(Second(round(Int, typemax(Int64) / 1e9 / 500)))
30 weeks, 3 days, 12 hours, 5 minutes, 44 seconds

julia> Dates.canonicalize(Second(round(Int, typemax(Int64) / 1e9 / 44000)))
2 days, 10 hours, 13 minutes, 42 seconds

ericphanson · 2025-03-04T11:24:36Z

hm, I think:

the transformation from closed-open to closed-closed is exact meaning the intervals represented are the same
the +1ns change on the check is correct to have the semantically-same check in both versions (in both, we are actually checking membership-in-interval-extended-by-1ns)
but the original check was too loose by 1ns in a way that is a bit more obvious here with the explicit +1ns

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

kleinschmidt · 2025-03-04T16:16:00Z

src/time_index_conversions.jl

+    if isinteger(sample_rate)
+        # avoid floating-point rounding issues
+        # https://github.com/beacon-biosignals/AlignedSpans.jl/pull/42
+        return Nanosecond(ceil(Int, (sample_index - 1) * NS_IN_SEC // Int(sample_rate)))
+    else
+        return Nanosecond(ceil(Int, (sample_index - 1) * (NS_IN_SEC / sample_rate)))
+    end


Okay now that I'm thinking about this again I'm kinda second guessing my suggestion in #42...

I think that actually just doing the multiplication first is sufficient here. the usage of the rational is just a way to ensure that we can do that if we have to return one thing out of a nanoseconds_per_sample function. I'd suggest doing something like (with extra parens to emphasize that the multiplication is meant to happen first, which normal operator precedence gives you)

Suggested change

if isinteger(sample_rate)

# avoid floating-point rounding issues

# https://github.com/beacon-biosignals/AlignedSpans.jl/pull/42

return Nanosecond(ceil(Int, (sample_index - 1) * NS_IN_SEC // Int(sample_rate)))

else

return Nanosecond(ceil(Int, (sample_index - 1) * (NS_IN_SEC / sample_rate)))

end

# avoid floating-point rounding issues by multiplying first

return Nanosecond(ceil(Int, ((sample_index - 1) * NS_IN_SEC) / sample_rate))

if we're concerned about integer overflow with (sample_index - 1) * NS_IN_SEC, we have a few options:

convert both to float ahead of time and warn when the total is outside the range of what float64 can represent (2^53 according to https://stackoverflow.com/a/72729449)

check if the total would overflow (e.g., if sample_index > typemax(Int) / NS_IN_SEC) and convert to float

I think that converting to float ahead of time is not great given that it corresponds to just

julia> 2^53 / 1e9 / 500 / 3600 5.003999585967217

hours of data at 500Hz.

HOWEVER, with my suggested change, we'd still be potentially truncating precision of anything over this threshold because /(::Int, ::Int) promotes to float before executing. and we don't want (I think) to use div (integer division) because it truncates, whereas we're committed to using ceil here for compatibility with TimeSpans...

The advantage of using a rational I've now discovered is that it will attempt to simplify the ratio and you might end up with num/den that can be exactly represented as floats. But that requires finding the gcd of the two on each of these operations, which seems...less than ideal to me, especially if we're in a regime where we can exactly represent (sample_index - 1) * NS_IN_SEC and sample_rate as floats already...

uhm hm, I'm not sure, doing the multiplication first does concern me viz-a-viz overflow. Rationals also use checked_div and checked_mul to avoid silent overflow, IIRC.

Maybe doing the multiplication with Int128s could be fast enough and avoid overflow issues?

It's pretty easy to check whether an overflow would occur and convert to float first. I still think it's better numerical stability-wise to do the multiplication before division, even if we have to coerce to floats first...

At least, it fixes the edge case that spawned this, and doesn't introduce any new ones at least as far as our tests can detect...

Suggested change

if isinteger(sample_rate)

# avoid floating-point rounding issues

# https://github.com/beacon-biosignals/AlignedSpans.jl/pull/42

return Nanosecond(ceil(Int, (sample_index - 1) * NS_IN_SEC // Int(sample_rate)))

else

return Nanosecond(ceil(Int, (sample_index - 1) * (NS_IN_SEC / sample_rate)))

end

# use Int128's to avoid floating-point rounding issues

return Nanosecond(ceil(Int, (Int128(sample_index - 1) * Int128(NS_IN_SEC)) / sample_rate))

I'm really mystified by the different behavior of cld and ceil( / ). cld just uses div(a, b, RoundUp) under the hood...

julia> x = 1/30 0.03333333333333333 julia> 1/x 30.0 julia> div(1, x) 30.0 julia> div(1, x, RoundUp) 31.0 julia> ceil(1/x) 30.0 julia> ceil(Int, 1/x) 30

this has gotta be a bug... I will report on the Julia issue tracker

I guess it's possible that it's like float weirdness ...

JuliaLang/julia#57678

so cld is more accurate here, but it is not giving the answer we want for this. We also can't use higher precision for sampling rate since that is user supplied as Float64. AlignedSpans could be relaxed to allow other data types for sampling rate, which would help in some cases (not when pulling the rate directly from an Onda samples, but can help in the hynogram-computation case when the user is writing down 1/30 by hand; can ask them to supply 1//30 instead).

ericphanson · 2025-03-11T12:48:34Z

I did some testing:

using DataFrames, Dates, Test
table = DataFrame()

function tryinexact(f)
    try
        f()
    catch e
        e isa InexactError && rethrow
        Nanosecond(-1)
    end
end
for sample_index in [1, 2, 5, 10, 10^8, 10^12],
    rat_sample_rate in [1, 1 // 10, 180, 4 // 3, 1 // 3, 1 // 30, 500, 1000, 1024, 44_000, 1 // 5]

    NS_IN_SEC = 10^9

    exact =
        try
            Nanosecond(ceil(Int, (BigInt(sample_index - 1) * BigInt(NS_IN_SEC)) / rat_sample_rate))

        catch e
            # answer cannot fit into nanoseconds
            e isa InexactError && continue
            rethrow()
        end

    v0 = exact_cld = Nanosecond(cld(BigInt(sample_index - 1) * BigInt(NS_IN_SEC), rat_sample_rate))

    sample_rate = Float64(rat_sample_rate)

    exact64 = Nanosecond(ceil(Int, (BigInt(sample_index - 1) * BigInt(NS_IN_SEC)) / sample_rate))


    v1 = tryinexact() do
        Nanosecond(ceil(Int, (Int64(sample_index - 1) * Int64(NS_IN_SEC)) / sample_rate))
    end
    v2 = tryinexact() do
        Nanosecond(ceil(Int, Int64(sample_index - 1) * (Int64(NS_IN_SEC) / sample_rate)))
    end
    v3 = Nanosecond(ceil(Int, (Int128(sample_index - 1) * Int128(NS_IN_SEC)) / sample_rate))
    v4 = Nanosecond(ceil(Int, (Float64(sample_index - 1) * Float64(NS_IN_SEC)) / sample_rate))
    v5 = Nanosecond(cld(Float64(sample_index - 1) * Float64(NS_IN_SEC), sample_rate))

    push!(table, (; sample_index, rat_sample_rate, exact, exact64, v0, v1, v2, v3, v4, v5))
end
transform!(table, (["exact", "v$i"] => ByRow(==) => "exact_matches_v$i" for i in 0:5)...)
transform!(table, (["exact64", "v$i"] => ByRow(==) => "exact64_matches_v$i" for i in 0:5)...)

@test table.exact_matches_v3 == table.exact_matches_v4
@test table.exact64_matches_v3 == table.exact64_matches_v4

julia> tally(df, cols) = sort!(combine(groupby(df, cols), nrow => :count), :count; rev=true)
tally (generic function with 1 method)

julia> tally(table, r"exact_matches")
5×7 DataFrame
 Row │ exact_matches_v0  exact_matches_v1  exact_matches_v2  exact_matches_v3  exact_matches_v4  exact_matches_v5  count 
     │ Bool              Bool              Bool              Bool              Bool              Bool              Int64 
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │             true              true              true              true              true              true     45
   2 │             true              true              true              true              true             false      9
   3 │             true             false             false             false             false             false      4
   4 │             true             false              true              true              true              true      1
   5 │             true              true             false              true              true              true      1

so in 45 cases they all match exact, in 9 cases all but v5 (Float64 cld) matches, in 4 cases only v0 matches (which we don't have access to as it uses the rational sampling rate), and there's 1 case each where only v1/v2 (Int64 variations) fails. So that suggests v3 (Int128's) and v4 (Float64's) are equally good here, and I guess Float64 should be more performant, so let's go with that.

ericphanson · 2025-03-11T12:57:20Z

oh wait, I had a horrible idea:

v6 = Nanosecond(cld(Int128(sample_index - 1) * Int128(NS_IN_SEC), rationalize(sample_rate)))

This matches the exact bigfloat/rational version for every test case in that loop.

using DataFrames, Dates, Test
table = DataFrame()

function tryinexact(f)
    try
        f()
    catch e
        e isa InexactError && rethrow
        Nanosecond(-1)
    end
end
for sample_index in [1, 2, 5, 10, 10^8, 10^12],
    rat_sample_rate in [1, 1 // 10, 180, 4 // 3, 1 // 3, 1 // 30, 500, 1000, 1024, 44_000, 1 // 5]

    NS_IN_SEC = 10^9

    exact =
        try
            Nanosecond(ceil(Int, (BigInt(sample_index - 1) * BigInt(NS_IN_SEC)) / rat_sample_rate))

        catch e
            # answer cannot fit into nanoseconds
            e isa InexactError && continue
            rethrow()
        end

    v0 = exact_cld = Nanosecond(cld(BigInt(sample_index - 1) * BigInt(NS_IN_SEC), rat_sample_rate))

    sample_rate = Float64(rat_sample_rate)

    exact64 = Nanosecond(ceil(Int, (BigInt(sample_index - 1) * BigInt(NS_IN_SEC)) / sample_rate))


    v1 = tryinexact() do
        Nanosecond(ceil(Int, (Int64(sample_index - 1) * Int64(NS_IN_SEC)) / sample_rate))
    end
    v2 = tryinexact() do
        Nanosecond(ceil(Int, Int64(sample_index - 1) * (Int64(NS_IN_SEC) / sample_rate)))
    end
    v3 = Nanosecond(ceil(Int, (Int128(sample_index - 1) * Int128(NS_IN_SEC)) / sample_rate))
    v4 = Nanosecond(ceil(Int, (Float64(sample_index - 1) * Float64(NS_IN_SEC)) / sample_rate))
    v5 = Nanosecond(cld(Float64(sample_index - 1) * Float64(NS_IN_SEC), sample_rate))

    v6 = Nanosecond(cld(Int128(sample_index - 1) * Int128(NS_IN_SEC), rationalize(sample_rate)))

    push!(table, (; sample_index, rat_sample_rate, exact, exact64, v0, v1, v2, v3, v4, v5, v6))
end
transform!(table, (["exact", "v$i"] => ByRow(==) => "exact_matches_v$i" for i in 0:6)...)
transform!(table, (["exact64", "v$i"] => ByRow(==) => "exact64_matches_v$i" for i in 0:6)...)

@test table.exact_matches_v3 == table.exact_matches_v4
@test table.exact64_matches_v3 == table.exact64_matches_v4
@test all(table.exact_matches_v6)

This feels gross since it's very do-what-I-mean-not-what-I-say, but practically it might be a good solution especially for stuff like hypnograms with their 1/30 sampling rate. It's not that slow either:

julia> @benchmark rationalize(x) setup=(x=100*rand())
BenchmarkTools.Trial: 10000 samples with 301 evaluations per sample.
 Range (min … max):  214.425 ns …   1.146 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     662.515 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   670.993 ns ± 114.796 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                             ▁  ▅  ▇  ▅▃  █  ▅  ▂                
  ▂▁▂▁▁▂▂▁▁▂▂▂▂▂▂▂▂▃▂▂▅▃▂▆▅▂▆█▃▅█▄▃██▃██▃▇█▄▅█▅▃█▆▃▅█▃▃▆▃▂▄▃▂▃▃ ▄
  214 ns           Histogram: frequency by time          947 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

ericphanson · 2025-03-18T11:04:59Z

julia> @benchmark TimeSpans.time_from_index(sample_rate, sample_index) setup=(sample_rate=rand()*100; sample_index=rand(1:10^6))
BenchmarkTools.Trial: 10000 samples with 1000 evaluations per sample.
 Range (min … max):  3.541 ns … 33.250 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     3.625 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.639 ns ±  0.331 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▁     ▆     █      ▃     ▂      ▃                  ▁       ▁
  █▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁▁█▁▁▁▁▁▇▁▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁▇ █
  3.54 ns      Histogram: log(frequency) by time     3.92 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

vs v6:

julia> @benchmark AlignedSpans.time_from_index(sample_rate, sample_index) setup=(sample_rate=rand()*100; sample_index=rand(1:10^6))
BenchmarkTools.Trial: 10000 samples with 194 evaluations per sample.
 Range (min … max):  461.985 ns …   1.750 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     853.521 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   856.295 ns ± 117.477 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                        ▁ ▂▁▃▅▄▅▄▆▅▆█▅▄▇▆▆▅▄▃▃▂▃▁                
  ▂▂▁▂▂▂▂▂▃▃▂▃▃▃▄▄▄▄▆▆▇▇█▇████████████████████████▇▇▆▆▆▅▄▄▄▃▃▃▃ ▅
  462 ns           Histogram: frequency by time         1.14 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

So it takes 100x-200x longer, but median time is still less than a microsecond.

If I do

timespans = [TimeSpan(rand(1:10^5), rand(10^5:10^7)) for _ in 1:10^6]
f(timespans) = [AlignedSpan(100*rand(), ts, RoundSpanDown) for ts in timespans]

then

@time f(timespans);

then I get

  0.126508 seconds (4.00 M allocations: 114.441 MiB, 5.39% gc time) # v4
  2.354494 seconds (4.00 M allocations: 114.441 MiB, 2.41% gc time) # v6

It's a bit hard to know if this is ever the bottleneck in some workload but the slowdown to v6 is enough that it feels like it could affect things.

I think one thing we could do is switch on isinteger(sample_rate), because anecdotally 99% of sample rates we encounter are indeed integer, and there we can do something much simpler...

ericphanson · 2025-03-18T11:17:01Z

if I do

    if isinteger(sample_rate)
        v7 = Nanosecond(cld(Int128(sample_index - 1) * Int128(NS_IN_SEC), Int128(sample_rate)))
    else
        v7 = v6
    end

then with

g(timespans) = [AlignedSpan(round(Int, 100*rand() + 1), ts, RoundSpanDown) for ts in timespans];

I get

  0.168244 seconds (2.00 M allocations: 83.923 MiB, 4.34% gc time) # v7

so only slightly slower than v4 on integer sampling rates, while still matching exact in all cases.

full script

using DataFrames, Dates, Test
table = DataFrame()

function tryinexact(f)
    try
        f()
    catch e
        e isa InexactError && rethrow
        Nanosecond(-1)
    end
end
for sample_index in [1, 2, 5, 10, 10^8, 10^12],
    rat_sample_rate in [1, 1 // 10, 180, 4 // 3, 1 // 3, 1 // 30, 500, 1000, 1024, 44_000, 1 // 5]

    NS_IN_SEC = 10^9

    exact =
        try
            Nanosecond(ceil(Int, (BigInt(sample_index - 1) * BigInt(NS_IN_SEC)) / rat_sample_rate))

        catch e
            # answer cannot fit into nanoseconds
            e isa InexactError && continue
            rethrow()
        end

    v0 = exact_cld = Nanosecond(cld(BigInt(sample_index - 1) * BigInt(NS_IN_SEC), rat_sample_rate))

    sample_rate = Float64(rat_sample_rate)

    exact64 = Nanosecond(ceil(Int, (BigInt(sample_index - 1) * BigInt(NS_IN_SEC)) / sample_rate))


    v1 = tryinexact() do
        Nanosecond(ceil(Int, (Int64(sample_index - 1) * Int64(NS_IN_SEC)) / sample_rate))
    end
    v2 = tryinexact() do
        Nanosecond(ceil(Int, Int64(sample_index - 1) * (Int64(NS_IN_SEC) / sample_rate)))
    end
    v3 = Nanosecond(ceil(Int, (Int128(sample_index - 1) * Int128(NS_IN_SEC)) / sample_rate))
    v4 = Nanosecond(ceil(Int, (Float64(sample_index - 1) * Float64(NS_IN_SEC)) / sample_rate))
    v5 = Nanosecond(cld(Float64(sample_index - 1) * Float64(NS_IN_SEC), sample_rate))

    v6 = Nanosecond(cld(Int128(sample_index - 1) * Int128(NS_IN_SEC), rationalize(sample_rate)))

    if isinteger(sample_rate)
        v7 = Nanosecond(cld(Int128(sample_index - 1) * Int128(NS_IN_SEC), Int128(sample_rate)))
    else
        v7 = v6
    end

    push!(table, (; sample_index, rat_sample_rate, exact, exact64, v0, v1, v2, v3, v4, v5, v6, v7))
end
transform!(table, (["exact", "v$i"] => ByRow(==) => "exact_matches_v$i" for i in 0:7)...)
transform!(table, (["exact64", "v$i"] => ByRow(==) => "exact64_matches_v$i" for i in 0:7)...)

@test table.exact_matches_v3 == table.exact_matches_v4
@test table.exact64_matches_v3 == table.exact64_matches_v4
@test all(table.exact_matches_v6)

tally(df, cols) = sort!(combine(groupby(df, cols), nrow => :count), :count; rev=true)
tally(table, r"exact_matches")

## timing

timespans = [TimeSpan(rand(1:10^5), rand(10^5:10^7)) for _ in 1:10^6]

f(timespans) = [AlignedSpan(100*rand(), ts, RoundSpanDown) for ts in timespans];
g(timespans) = [AlignedSpan(round(Int, 100*rand() + 1), ts, RoundSpanDown) for ts in timespans];

# after revising change into AlignedSpans.jl
@time f(timespans);
@time g(timespans);

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

ericphanson added 6 commits February 28, 2025 17:53

fix shadowing

40c6c88

add test

535971a

add test

686a115

fix tests

0cb496c

add tests

260544f

fix bug by canonicalizing to closed-closed

1c3a7db

ericphanson requested a review from kleinschmidt February 28, 2025 17:35

kleinschmidt reviewed Feb 28, 2025

View reviewed changes

kleinschmidt mentioned this pull request Feb 28, 2025

Attempt to handle floating point weirdness with rationals #42

Merged

kleinschmidt and others added 6 commits March 4, 2025 12:25

Attempt to handle floating point weirdness with rationals (#42)

c036b4b

inline maybe_rational

904996b

tweak parens

0df8c08

clarify tests

576c1d9

tweak check

5139a70

Update test/time_index_conversions.jl

61c6550

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

ericphanson requested a review from kleinschmidt March 4, 2025 12:53

kleinschmidt reviewed Mar 4, 2025

View reviewed changes

use float64 version

f718510

ericphanson requested a review from kleinschmidt March 11, 2025 12:50

ericphanson and others added 2 commits March 18, 2025 12:17

up

09c5af5

Apply suggestions from code review

ece2af4

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

ericphanson mentioned this pull request Mar 18, 2025

use rational/int sampling rates #44

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix assert violation found in #40 by canonicalizing to closed-closed intervals #41

fix assert violation found in #40 by canonicalizing to closed-closed intervals #41

ericphanson commented Feb 28, 2025

kleinschmidt left a comment

kleinschmidt Feb 28, 2025

kleinschmidt Feb 28, 2025

kleinschmidt Feb 28, 2025

kleinschmidt Feb 28, 2025

kleinschmidt commented Feb 28, 2025

ericphanson commented Mar 4, 2025

kleinschmidt Mar 4, 2025

ericphanson Mar 4, 2025 •

edited

Loading

ericphanson Mar 4, 2025

kleinschmidt Mar 4, 2025

ericphanson Mar 4, 2025 •

edited

Loading

kleinschmidt Mar 7, 2025

ericphanson Mar 7, 2025

kleinschmidt Mar 7, 2025

ericphanson Mar 7, 2025

ericphanson Mar 7, 2025

ericphanson commented Mar 11, 2025

ericphanson commented Mar 11, 2025 •

edited

Loading

ericphanson commented Mar 18, 2025

ericphanson commented Mar 18, 2025

		# let's add an check to verify. Note here we add 1ns to make it an open span again, since we've converted to closed
		if !(t <= last(interval) + Nanosecond(1))

fix assert violation found in #40 by canonicalizing to closed-closed intervals #41

Are you sure you want to change the base?

fix assert violation found in #40 by canonicalizing to closed-closed intervals #41

Conversation

ericphanson commented Feb 28, 2025

kleinschmidt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kleinschmidt commented Feb 28, 2025

ericphanson commented Mar 4, 2025

Choose a reason for hiding this comment

ericphanson Mar 4, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericphanson Mar 4, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericphanson commented Mar 11, 2025

ericphanson commented Mar 11, 2025 • edited Loading

ericphanson commented Mar 18, 2025

ericphanson commented Mar 18, 2025

ericphanson Mar 4, 2025 •

edited

Loading

ericphanson Mar 4, 2025 •

edited

Loading

ericphanson commented Mar 11, 2025 •

edited

Loading