Skip to content

Introduce lazyAsync #4423

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
CLOVIS-AI opened this issue Apr 23, 2025 · 10 comments
Open

Introduce lazyAsync #4423

CLOVIS-AI opened this issue Apr 23, 2025 · 10 comments

Comments

@CLOVIS-AI
Copy link
Contributor

CLOVIS-AI commented Apr 23, 2025

Use case

I often happen to want to declare a suspending computation meant to be executed later, but do not have access to a scope at that moment.

If I had access to a scope, I could use async(start = LAZY) (or a remplacement as described in #4202). However, in these situations, I do not have access to a scope.

As a first example, see this thread in KotlinLang. The user is attempting to initialize a data structure, which requires an async operation. However, that async operation doesn't need to happen right now, it could happen on first use. A simplified version of the problem is:

val health = Health(
    runBlocking {  // ⚠ probably shouldn't use runBlocking here
        redis.setUpPing()
    }
)

This example could be rewritten:

val heatlh = Health(
    lazyAsync {
        redis.setUpPing()
    }
)

Another example can be found in the declaration of database indexes or other such metadata. It would be great to be able to write:

class MyRepository(
    private val database: Database
) {
    init {
        database.ensureIndex("a") { … }  // ⚠ can't suspend here
        database.ensureIndex("b") { … }
    }

    suspend fun findOneById(id: String) { … }
}

With this proposal, this example could be rewritten as:

class MyRepository(
    private val database: Database
) {
    val indexes = lazyAsync {  // will execute in the scope of the first coroutine to await it
        database.ensureIndex("a") { … }
        database.ensureIndex("b") { … }
    }

    suspend fun findOneById(id: String) {
        indexes.await()
        //
    }
}

Over the years I've seen many examples that could be boiled down to either of the two scenarii described above.

The Shape of the API

Simple implementation:

// Identical as CoroutineScope.async, but:
//  • doesn't have a receiver
//  • doesn't have a coroutineStart parameter, since it is always lazy
fun <T> lazyAsync(
    coroutineContext: CoroutineContext = EmptyCoroutineContext,
    block: suspend CoroutineScope.() -> Unit,
): Deferred<T> = LazyDeferredImpl(coroutineContext, block)

private class LazyDeferredImpl(
    private val additionalContext: CoroutineContext,
    private val initializer: suspend CoroutineScope.() -> T,
) : Deferred<T> {
    private var value: Any? = null
    private var initialized: Boolean = false
    private val lock = Mutex()
    
    override suspend fun await(): T {
        // Short path
        if (initialized) return value

        lock.withLock {
            if (initialized) return value
            value = withContext(additionalContext) { initializer() }
            initialized = true
        }

        return value
    }

    //
}

I'm sure there are many ways to optimize the implementation, this one is merely an example. Looking at the existing codebase, probably null-ing the initializer after it has run, using some kind of atomic reference for the value with a val UNINITIALIZED = Any() special value instead of having a Mutex, probably?

I don't have a particular attachment to the name lazyAsync. Maybe another name can be more useful, I don't know.

Prior Art

The Shared concept from the Prepared library (I'm the author) is essentially the same thing. A Shared value is a test fixture that is declared once globally, and can be used within multiple tests; its initializer is suspend and only runs the first time the shared value is awaited, after which its result is cached for all further usages.

This class allows declaring suspending test fixtures (e.g. shared { MongoClient.connect() }) and reusing them between many tests (even though they have different CoroutineContext which we cannot access at declaration-time) with the guarantee that it will only be initialized once.

@JakeWharton
Copy link
Contributor

This is basically a variant of memoization, and if considered for inclusion I'd prefer a timeout parameter also be introduced (which can default to infinity). I'm surprised there isn't a memoization feature request already.

I, too, wrote one a while ago (sans context param): https://github.com/JakeWharton/SdkSearch/blob/33e4b60af539aef108d1d3193f99ae151ce14733/backend/dac-proxy/src/main/java/com/jakewharton/sdksearch/proxy/memoize.kt.

@CLOVIS-AI
Copy link
Contributor Author

(additionally, if this is deemed useful, I can submit the PR.)

@dkhalanskyjb
Copy link
Collaborator

This looks like a specialized version of https://kotlinlang.org/api/kotlinx.coroutines/kotlinx-coroutines-core/kotlinx.coroutines.flow/state-in.html, for which Jake's request for the timeout parameter is fulfilled by https://kotlinlang.org/api/kotlinx.coroutines/kotlinx-coroutines-core/kotlinx.coroutines.flow/-sharing-started/-companion/-while-subscribed.html . The notable difference is that for stateIn, a coroutine scope is required, and I am not convinced that inline execution without a coroutine scope is beneficial.

  • Threads have states and special purposes (for example, UI threads should not connect to databases). For example, code running in a EmptyCoroutineContext uses Dispatchers.Default instead of using the current thread in kotlinx.coroutines. A specific thread could be ensured by specifying additionalContext, but why is the context "additional"? I feel that it should be both mandatory (at least to specify the dispatcher) and also the only context available to the computation, to make the results more predictable and less dependent on the caller of await().
  • Cancellation exceptions during the computation of the value behave in a way I don't find intuitive.
class MyRepository(
    private val database: Database
) {
    val indexes = lazyAsync {
        withContext(NonCancellable) {
            database.ensureIndex("a") { … }
            database.ensureIndex("b") { … }
        }
    }

    suspend fun findOneById(id: String) {
        indexes.await()
        //
    }
}

Here, let's call findOneById and cancel it in parallel. I'd expect an await() call to throw a CancellationException immediately, but instead, we will proceed with the attempt to populate indexes (and then fail anyway once the withContext(NonCancellable) exits, rendering the whole computation meaningless). More realistically, instead of withContext(NonCancellable), there can be simply a long-running computation that rarely checks for cancellation. For me, this is one more point to the idea that a coroutine scope would be welcome for this use case: I feel that dropping the attempt to compute a value should be done immediately.

Some other thoughts:

@CLOVIS-AI
Copy link
Contributor Author

CLOVIS-AI commented May 7, 2025

Threads have states and special purposes (for example, UI threads should not connect to databases). For example, code running in a EmptyCoroutineContext uses Dispatchers.Default instead of using the current thread in kotlinx.coroutines.

Users can ensure a specific context either by using withContext() before .await(), or by using the coroutineContext parameter optional parameter, which is consistent with every other coroutine builder.

A specific thread could be ensured by specifying additionalContext, but why is the context "additional"? I feel that it should be both mandatory (at least to specify the dispatcher) and also the only context available to the computation, to make the results more predictable and less dependent on the caller of await().

The name 'additional' doesn't appear in the public API. The coroutineContext parameter behaves exactly in the same way as the coroutineContext parameter of launch and async.

Making the context mandatory defeats the entire purpose of this feature: declaring code to run later. With a mandatory context, this is basically just CoroutineScope.async().

Cancellation exceptions during the computation of the value behave in a way I don't find intuitive. […] I'd expect an await() call to throw a CancellationException immediately […] I feel that dropping the attempt to compute a value should be done immediately.

Indeed, this is not the behavior of my implementation. I agree that it would be a better if it behaved the way you described, but that is a matter of writing a better implementation, I don't think this is a rebuttal of the feature itself.

Additionally, this looks similar to https://kotlinlang.org/api/core/kotlin-stdlib/kotlin/lazy.html, except it's suspend and it does not support different strategies for synchronization.

Yes, this is on purpose. lazy can't be used for suspending initialization, this feature provides that ability. I don't having strategies for synchronization makes sense here, which is why I didn't include them.

Another thing to keep in mind is that lambdas can and will throw exceptions. That the operation can fail on one attempt but then somehow succeed on the next one makes me wary, but stdlib's lazy behaves the same way.

Uh, I never realized you could semi-initialize a lazy multiple times if the first one fails. Honestly, I don't like that behavior, and it wasn't intentional in my proposal. I'm sure I've written broken code that assumes the lazy block can never run multiple times :/

The simple implementation is insufficient, as it does not account for multithreading: if a CPU observed initialized == true, it does not imply that it will also observe the initialized value.

Doesn't the lock guarantee that it does? Anyway—as I mentioned, the implementation isn't normative, it was just a quick draft to demonstrate the intent.

If anything, the number of flaws you found in my implementation is probably a sign that this should be provided by the Coroutines authors directly and not left to users to implement for themselves… As mentioned by both Jake and I, there are multiple implementations in the wild already.

@dkhalanskyjb
Copy link
Collaborator

either by using withContext() before .await(), or by using the coroutineContext parameter optional parameter [...]

Sure, this is fixable, my point is that it looks like a foot gun.

[...], which is consistent with every other coroutine builder.

The other coroutine builders do not inherit the context of the caller. Calling a suspend fun f() { scope.launch { } } in different contexts will not affect the context of launch, and suspend fun f() { myDeferred.await() } certainly won't. That's what's inconsistent and possibly surprising.

that is a matter of writing a better implementation, I don't think this is a rebuttal of the feature itself.

How do you propose this could be fixed in a better implementation? I think the issue is fundamental. Cancellation is cooperative, so if the code doesn't notice cancellation exceptions, the thread running that code gets stuck. It is only possible to return from await immediately if the code of lazyAsync executes on some other thread, and to do that by default, we would need that code to know what threads it's allowed to run on—which means having to supply a dispatcher to it, which, according to you, renders the feature useless.

Doesn't the lock guarantee that it does?

Nope. The rules for establishing when a CPU sees which changes another CPU has made are called "happens-before", and actions do not happen-before setting a non-volatile variable (that is, initialized = true can happen before value = /* what got computed */).

Though this, indeed, is just a bug and not a critical issue.

the number of flaws you found in my implementation is probably a sign that this should be provided by the Coroutines authors directly

If this is an error-prone pattern, including it will only increase the chance that someone writes buggy code because of it instead of having to reach for the more appropriate tools (like stateIn, it looks like). This could be just a question of documentation, explaining that stateIn could also be idiomatically used for memoization.

@CLOVIS-AI
Copy link
Contributor Author

CLOVIS-AI commented May 7, 2025

Can you clarify how stateIn solves this issue?

// The method we want to cache
suspend fun foo() { … }

// Declaration-time, no access to a CoroutineScope
val fooCached =?

// Usage-time, some kind of .await() or suspend invoke or whatever to access the value
suspend fun bar() {
    println("Cached: ${fooCached.await()}")
}

runBlocking {
    repeat {
        bar()
    }
}

@dkhalanskyjb
Copy link
Collaborator

The way the issue is stated, it doesn't: if you can't use any CoroutineScope, the proposed design of lazyAsync is the only one possible, for better or worse. That said, what's wrong with defining a val scopeForComputingMyValue = CoroutineScope(SupervisorJob() + CoroutineExceptionHandler { _, e -> /* handle your exceptions as your logic demands */ } + Dispatchers.Default) or something similar and using that?

Searching around, I found further examples of attempts to implement this:

So, the need for this is obvious, and the analogy with lazy is clear. Whatever the resolution, we have to at least document it prominently.

@CLOVIS-AI
Copy link
Contributor Author

So something like,

fun <T> lazyAsync(
    coroutineContext: CoroutineContext = EmptyCoroutineContext,
    block: suspend CoroutineScope.() -> Unit,
): Deferred<T> = LazyDeferredImpl(coroutineContext, block)

private class LazyDeferredImpl(
    private val additionalContext: CoroutineContext,
    private val initializer: suspend CoroutineScope.() -> T,
) : Deferred<T> {
    private val scope = CoroutineScope(SupervisorJob() + additionalContext)
    private val deferred = scope.async(CoroutineStart.LAZY) { initialize() }
    
    override suspend fun await(): T =
        deferred.await()

    //
}

Having a local scope seems a bit strange to me, but if you say it's a good pattern I don't particularly have an issue with it. I guess a difference is that it will not try to re-initialize itself if the first attempt fails/is cancelled, but I don't think that's a deal-breaker (I didn't even know my version did this).

A big difference, however, is that this version will not be able to access the coroutine context of the running application. While that's a double-edged sword (it's easy to create a lazy that has a different value based on the call-site order), I think there is value in being able to access the context from there.

@dkhalanskyjb
Copy link
Collaborator

I think there is value in being able to access the context from [the caller].

Do you have some specific examples in mind?

@CLOVIS-AI
Copy link
Contributor Author

@dkhalanskyjb I thought about it and reviewed my existing usages, and I can't manage to convince myself either way.

  • Having an auth marker in the context: in some projects, we enforce that methods can only be called when the user's identity is in the context, to ensure we can check for roles in any layer of the application. Accessing it from a lazy is dangerous, because the data could accidentally be reused with other users. But to be fair, it is also a risk when using a regular async {}.
  • Having a TestDispatcher in the context: clearly, we want the lazyAsync to automatically run in the same dispatcher as the test, otherwise we lose delay-skipping. We don't want explicit code needed to pass through the test dispatcher each time lazyAsync is used, because people won't remember to write it. If the same lazyAsync is used multiple times within a single test, that seems perfectly reasonnable to me.
  • However, sharing a lazyAsync between multiple tests seems dangerous to me: if it contains a delay, only the first test to execute will delay for it, meaning all tests that access it may have a different timeline depending on their execution order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants