Skip to content

fix: crash on OTA reload #7196

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 12, 2025
Merged

fix: crash on OTA reload #7196

merged 5 commits into from
Mar 12, 2025

Conversation

tjzel
Copy link
Collaborator

@tjzel tjzel commented Mar 6, 2025

Summary

Fixes #7182
Requires #7197

Currently on iOS React Native can go into a non-fatal race condition on a double reload. Double reload can happen during an OTA update, when an app is reloaded immediately after evaluating the bundle.

The flow goes more or less like this. I haven't explored it to the deepest details.

  1. On the first reload a new JavaScript runtime is created.
  2. The new runtime evaluates the bundle, starts to populate RCTModuleRegistry.
  3. On the second reload a new JavaScript runtime is created. The old runtime remains and keeps executing. Both runtimes execute on the same thread, on the same queue.
  4. WorkletsModule.installTurboModule() is called from the first runtime. WorkletsModule is fetched from the RCTModuleRegistry.
  5. The second runtime overwrites (clears) the RCTModuleRegistry and starts to populate it. I don't know why both runtimes target the same instance of RCTModuleRegistry.
  6. WorkletsModule.installTurboModule() finishes successfully on the first runtime.
  7. ReanimatedModule.installTurboModule() is invoked in quick succession on the first runtime.
  8. ReanimatedModule tries to fetch WorkletsModule from RCTModuleRegistry. However, RCTModuleRegistry was wiped and it doesn't have the fully installed instance of WorkletsModule anymore. Therefore a new instance of WorkletsModule is created and inserted into the RCTModuleRegistry.
  9. In JavaScript we guarantee that ReanimatedModule.installTurboModule() is invoked only after WorkletsModule.installTurboModule(). We follow that heuristic in ObjC and therefore try operating on a non-installed instance of WorkletsModule resulting in a NullPtrException.

We can effectively detect that flow and ignore it, since that race condition is fatal only for the first runtime which is torn down soon after anyway.

See the provided screenshots which illustrate the problems.

Screenshot 2025-03-07 at 10 34 00 Screenshot 2025-03-07 at 12 25 23
Setting moduleRegistry for WorkletsModule during a healthy execution of WorkletsModule.installTurboModule() Setting moduleRegistry for WorkletsModule during race-condition execution. Note that the list of the modules is much shorter and that WorkletsModule isn't there.

Test plan

To reproduce this issue, reload twice on iOS in fabric-example. For best effect try to reload after ~0.5s after the first time, but it's easily reproducible if the App isn't slowed heavily by debugger instructions.

With these changes Reanimated doesn't crash, but the race condition continues to manifest.

@tjzel tjzel requested a review from tomekzaw March 7, 2025 11:28
@tjzel tjzel marked this pull request as ready for review March 7, 2025 11:28
@tjzel tjzel changed the base branch from main to @tjzel/worklets/fix-android-invalidation March 7, 2025 11:28
@tjzel
Copy link
Collaborator Author

tjzel commented Mar 7, 2025

cc @lukmccall

@tjzel tjzel requested a review from piaskowyk March 7, 2025 15:36
@tjzel tjzel mentioned this pull request Mar 7, 2025
13 tasks
@tjzel tjzel added this pull request to the merge queue Mar 12, 2025
Merged via the queue into main with commit 87781d8 Mar 12, 2025
13 checks passed
@tjzel tjzel deleted the @tjzel/fix-OTA-crash branch March 12, 2025 17:47
tjzel added a commit that referenced this pull request Mar 12, 2025
Fixes #7182
Requires #7197

Currently on iOS React Native can go into a non-fatal race condition on
a double reload. Double reload can happen during an OTA update, when an
app is reloaded immediately after evaluating the bundle.

The flow goes more or less like this. I haven't explored it to the
deepest details.
1. On the first reload a new JavaScript runtime is created.
2. The new runtime evaluates the bundle, starts to populate
`RCTModuleRegistry`.
3. On the second reload a new JavaScript runtime is created. The old
runtime remains and keeps executing. Both runtimes execute on the same
thread, on the same queue.
4. `WorkletsModule.installTurboModule()` is called from the first
runtime. `WorkletsModule` is fetched from the `RCTModuleRegistry`.
5. The second runtime overwrites (clears) the `RCTModuleRegistry` and
starts to populate it. I don't know why both runtimes target the same
instance of `RCTModuleRegistry`.
6. `WorkletsModule.installTurboModule()` finishes successfully on the
first runtime.
7. `ReanimatedModule.installTurboModule()` is invoked in quick
succession on the first runtime.
8. `ReanimatedModule` tries to fetch `WorkletsModule` from
`RCTModuleRegistry`. However, `RCTModuleRegistry` was wiped and it
doesn't have the fully installed instance of `WorkletsModule` anymore.
Therefore a new instance of `WorkletsModule` is created and inserted
into the `RCTModuleRegistry`.
9. In JavaScript we guarantee that
`ReanimatedModule.installTurboModule()` is invoked only after
`WorkletsModule.installTurboModule()`. We follow that heuristic in ObjC
and therefore try operating on a non-installed instance of
`WorkletsModule` resulting in a `NullPtrException`.

We can effectively detect that flow and ignore it, since that race
condition is fatal only for the first runtime which is torn down soon
after anyway.

See the provided screenshots which illustrate the problems.

| <img width="1018" alt="Screenshot 2025-03-07 at 10 34 00"
src="https://github.com/user-attachments/assets/263b9f0d-9573-48a8-99e8-82099221c3e3"
/> | <img width="1014" alt="Screenshot 2025-03-07 at 12 25 23"
src="https://github.com/user-attachments/assets/7dde0eda-526b-43fa-8181-1bec4ad7c38e"
/> |
|:---|:---|
| Setting `moduleRegistry` for `WorkletsModule` during a healthy
execution of `WorkletsModule.installTurboModule()` | Setting
`moduleRegistry` for `WorkletsModule` during race-condition execution.
Note that the list of the modules is much shorter and that
`WorkletsModule` isn't there. |

To reproduce this issue, reload twice on iOS in `fabric-example`. For
best effect try to reload after ~0.5s after the first time, but it's
easily reproducible if the App isn't slowed heavily by debugger
instructions.

With these changes Reanimated doesn't crash, but the race condition
continues to manifest.
@RSNara
Copy link

RSNara commented Mar 12, 2025

I don't know why both runtimes target the same instance of RCTModuleRegistry.

So, I initially made this change. The rationale: RCTHost can leak a reference to the module registry to the application. We didn't want that reference to break across reloads. That's why we had the host own the module registry.

But, yeah. This is a little bit problematic. Now, it means that two different interleaving react instances can share the same module registry, leading to undefined behaviour. In this case, the first react instance is accidentally using modules from the second. The module registry should probably be owned by the react instance.

@tjzel
Copy link
Collaborator Author

tjzel commented Mar 13, 2025

@RSNara Yeah, I encountered this problem before but didn't pay it much attention since it happened only on multiple reloads. It came back at me when it turned out that OTA updates are done via a double reload 😿 It's an easy fix on our side since we only obtain a TurboModule once, but libraries which pull them multiple times in runtime could be in more trouble.

tjzel added a commit that referenced this pull request Mar 16, 2025
Fixes #7182
Requires #7197

Currently on iOS React Native can go into a non-fatal race condition on
a double reload. Double reload can happen during an OTA update, when an
app is reloaded immediately after evaluating the bundle.

The flow goes more or less like this. I haven't explored it to the
deepest details.
1. On the first reload a new JavaScript runtime is created.
2. The new runtime evaluates the bundle, starts to populate
`RCTModuleRegistry`.
3. On the second reload a new JavaScript runtime is created. The old
runtime remains and keeps executing. Both runtimes execute on the same
thread, on the same queue.
4. `WorkletsModule.installTurboModule()` is called from the first
runtime. `WorkletsModule` is fetched from the `RCTModuleRegistry`.
5. The second runtime overwrites (clears) the `RCTModuleRegistry` and
starts to populate it. I don't know why both runtimes target the same
instance of `RCTModuleRegistry`.
6. `WorkletsModule.installTurboModule()` finishes successfully on the
first runtime.
7. `ReanimatedModule.installTurboModule()` is invoked in quick
succession on the first runtime.
8. `ReanimatedModule` tries to fetch `WorkletsModule` from
`RCTModuleRegistry`. However, `RCTModuleRegistry` was wiped and it
doesn't have the fully installed instance of `WorkletsModule` anymore.
Therefore a new instance of `WorkletsModule` is created and inserted
into the `RCTModuleRegistry`.
9. In JavaScript we guarantee that
`ReanimatedModule.installTurboModule()` is invoked only after
`WorkletsModule.installTurboModule()`. We follow that heuristic in ObjC
and therefore try operating on a non-installed instance of
`WorkletsModule` resulting in a `NullPtrException`.

We can effectively detect that flow and ignore it, since that race
condition is fatal only for the first runtime which is torn down soon
after anyway.

See the provided screenshots which illustrate the problems.

| <img width="1018" alt="Screenshot 2025-03-07 at 10 34 00"
src="https://github.com/user-attachments/assets/263b9f0d-9573-48a8-99e8-82099221c3e3"
/> | <img width="1014" alt="Screenshot 2025-03-07 at 12 25 23"
src="https://github.com/user-attachments/assets/7dde0eda-526b-43fa-8181-1bec4ad7c38e"
/> |
|:---|:---|
| Setting `moduleRegistry` for `WorkletsModule` during a healthy
execution of `WorkletsModule.installTurboModule()` | Setting
`moduleRegistry` for `WorkletsModule` during race-condition execution.
Note that the list of the modules is much shorter and that
`WorkletsModule` isn't there. |

To reproduce this issue, reload twice on iOS in `fabric-example`. For
best effect try to reload after ~0.5s after the first time, but it's
easily reproducible if the App isn't slowed heavily by debugger
instructions.

With these changes Reanimated doesn't crash, but the race condition
continues to manifest.
tjzel added a commit that referenced this pull request Mar 17, 2025
Fixes #7182
Requires #7197

Currently on iOS React Native can go into a non-fatal race condition on
a double reload. Double reload can happen during an OTA update, when an
app is reloaded immediately after evaluating the bundle.

The flow goes more or less like this. I haven't explored it to the
deepest details.
1. On the first reload a new JavaScript runtime is created.
2. The new runtime evaluates the bundle, starts to populate
`RCTModuleRegistry`.
3. On the second reload a new JavaScript runtime is created. The old
runtime remains and keeps executing. Both runtimes execute on the same
thread, on the same queue.
4. `WorkletsModule.installTurboModule()` is called from the first
runtime. `WorkletsModule` is fetched from the `RCTModuleRegistry`.
5. The second runtime overwrites (clears) the `RCTModuleRegistry` and
starts to populate it. I don't know why both runtimes target the same
instance of `RCTModuleRegistry`.
6. `WorkletsModule.installTurboModule()` finishes successfully on the
first runtime.
7. `ReanimatedModule.installTurboModule()` is invoked in quick
succession on the first runtime.
8. `ReanimatedModule` tries to fetch `WorkletsModule` from
`RCTModuleRegistry`. However, `RCTModuleRegistry` was wiped and it
doesn't have the fully installed instance of `WorkletsModule` anymore.
Therefore a new instance of `WorkletsModule` is created and inserted
into the `RCTModuleRegistry`.
9. In JavaScript we guarantee that
`ReanimatedModule.installTurboModule()` is invoked only after
`WorkletsModule.installTurboModule()`. We follow that heuristic in ObjC
and therefore try operating on a non-installed instance of
`WorkletsModule` resulting in a `NullPtrException`.

We can effectively detect that flow and ignore it, since that race
condition is fatal only for the first runtime which is torn down soon
after anyway.

See the provided screenshots which illustrate the problems.

| <img width="1018" alt="Screenshot 2025-03-07 at 10 34 00"
src="https://github.com/user-attachments/assets/263b9f0d-9573-48a8-99e8-82099221c3e3"
/> | <img width="1014" alt="Screenshot 2025-03-07 at 12 25 23"
src="https://github.com/user-attachments/assets/7dde0eda-526b-43fa-8181-1bec4ad7c38e"
/> |
|:---|:---|
| Setting `moduleRegistry` for `WorkletsModule` during a healthy
execution of `WorkletsModule.installTurboModule()` | Setting
`moduleRegistry` for `WorkletsModule` during race-condition execution.
Note that the list of the modules is much shorter and that
`WorkletsModule` isn't there. |

To reproduce this issue, reload twice on iOS in `fabric-example`. For
best effect try to reload after ~0.5s after the first time, but it's
easily reproducible if the App isn't slowed heavily by debugger
instructions.

With these changes Reanimated doesn't crash, but the race condition
continues to manifest.
tjzel added a commit that referenced this pull request Mar 18, 2025
Fixes #7182
Requires #7197

Currently on iOS React Native can go into a non-fatal race condition on
a double reload. Double reload can happen during an OTA update, when an
app is reloaded immediately after evaluating the bundle.

The flow goes more or less like this. I haven't explored it to the
deepest details.
1. On the first reload a new JavaScript runtime is created.
2. The new runtime evaluates the bundle, starts to populate
`RCTModuleRegistry`.
3. On the second reload a new JavaScript runtime is created. The old
runtime remains and keeps executing. Both runtimes execute on the same
thread, on the same queue.
4. `WorkletsModule.installTurboModule()` is called from the first
runtime. `WorkletsModule` is fetched from the `RCTModuleRegistry`.
5. The second runtime overwrites (clears) the `RCTModuleRegistry` and
starts to populate it. I don't know why both runtimes target the same
instance of `RCTModuleRegistry`.
6. `WorkletsModule.installTurboModule()` finishes successfully on the
first runtime.
7. `ReanimatedModule.installTurboModule()` is invoked in quick
succession on the first runtime.
8. `ReanimatedModule` tries to fetch `WorkletsModule` from
`RCTModuleRegistry`. However, `RCTModuleRegistry` was wiped and it
doesn't have the fully installed instance of `WorkletsModule` anymore.
Therefore a new instance of `WorkletsModule` is created and inserted
into the `RCTModuleRegistry`.
9. In JavaScript we guarantee that
`ReanimatedModule.installTurboModule()` is invoked only after
`WorkletsModule.installTurboModule()`. We follow that heuristic in ObjC
and therefore try operating on a non-installed instance of
`WorkletsModule` resulting in a `NullPtrException`.

We can effectively detect that flow and ignore it, since that race
condition is fatal only for the first runtime which is torn down soon
after anyway.

See the provided screenshots which illustrate the problems.

| <img width="1018" alt="Screenshot 2025-03-07 at 10 34 00"
src="https://github.com/user-attachments/assets/263b9f0d-9573-48a8-99e8-82099221c3e3"
/> | <img width="1014" alt="Screenshot 2025-03-07 at 12 25 23"
src="https://github.com/user-attachments/assets/7dde0eda-526b-43fa-8181-1bec4ad7c38e"
/> |
|:---|:---|
| Setting `moduleRegistry` for `WorkletsModule` during a healthy
execution of `WorkletsModule.installTurboModule()` | Setting
`moduleRegistry` for `WorkletsModule` during race-condition execution.
Note that the list of the modules is much shorter and that
`WorkletsModule` isn't there. |

To reproduce this issue, reload twice on iOS in `fabric-example`. For
best effect try to reload after ~0.5s after the first time, but it's
easily reproducible if the App isn't slowed heavily by debugger
instructions.

With these changes Reanimated doesn't crash, but the race condition
continues to manifest.
kligarski added a commit to software-mansion/react-native-screens that referenced this pull request Apr 3, 2025
## Description

Fixes CI on both platforms.

## Changes

- change Example to use tab bar button test ID option available in
react-navigation 6 (previous one was correct for react-navigation 7 but
example apps use version 6)
- bump to `[email protected]` in order to fix the issue
with Android CI
(software-mansion/react-native-reanimated#7196)

## Test code and steps to reproduce

CI

## Checklist

- [x] Ensured that CI passes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants