Skip to content

[epoxy-paging] Crash in the PagedListEpoxyController when trying to invoke addTo on a null model #567

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rohandhruva opened this issue Oct 9, 2018 · 23 comments

Comments

@rohandhruva
Copy link

While trying to implement paged list + paged controller, I run into a crash that's not reliably reproducible. The crash seems to happen around 1 in 7 times, completely anecdotally.

I'm attaching the stack trace: note how my custom controller (that extends PagedListEpoxyController) is not in the stack trace because the crash happens in PagedListEpoxyController internals.

I will try to get a small sample app with a reliable reproducible test case. In the meantime, some information that could be useful:

  • In MyImplementationPagedListController, I have these two params:
    • modelBuildingHandler = EpoxyAsyncUtil.getAsyncBackgroundHandler()
    • diffingHandler = EpoxyAsyncUtil.getAsyncBackgroundHandler()
  • I'm using an RxPagedListBuilder with setFetchScheduler(Schedulers.io())
  • The problem reproduces more frequently with a very small page size: I used size 4 on a total list size 22

I noticed that PagedListModelCache holds null models, I am guessing one of those is likely mistakenly being passed through in getModels().

java.lang.NullPointerException: Attempt to invoke virtual method 'void com.airbnb.epoxy.EpoxyModel.addTo(com.airbnb.epoxy.EpoxyController)' on a null object reference
at com.airbnb.epoxy.EpoxyController.add(EpoxyController.java:465)
at com.airbnb.epoxy.paging.PagedListEpoxyController.addModels(PagedListEpoxyController.kt:78)
at com.airbnb.epoxy.paging.PagedListEpoxyController.buildModels(PagedListEpoxyController.kt:70)
at com.airbnb.epoxy.EpoxyController$1.run(EpoxyController.java:267)
at android.os.Handler.handleCallback(Handler.java:751)
at android.os.Handler.dispatchMessage(Handler.java:95)
at android.os.Looper.loop(Looper.java:154)
at android.os.HandlerThread.run(HandlerThread.java:61)
@rohandhruva
Copy link
Author

@yigit do you think it's a good idea to filter out nulls from the getModels()? I noticed that the method return type is already List<EpoxyModel<*>>, so it should not return nulls, so I'm a bit surprised to see how they're coming through.

@rohandhruva
Copy link
Author

Putting this code in MyImplementation of the controller helped, I did not see the crash after that:

override fun addModels(models: List<EpoxyModel<*>>) {
    super.add(models.filterNotNull())
}

@yigit
Copy link
Contributor

yigit commented Oct 9, 2018

that is weird, i sample to reproduce would be nice.

I'm guessing the threading model is not working such that models are being modified in another thread.

It is important that the notify thread of PagedList is the same as the buildModels thread of EpoxyController.

The test uses LivePagedListBuilder so maybe something is not matching in the rx builder.
Can you provide a sample that reproduces it? Does not need to be reliable since i can always assert threads if my guess is correct (of course, a reliable repro is much better)

@rohandhruva
Copy link
Author

It is important that the notify thread of PagedList is the same as the buildModels thread of EpoxyController.

Uh oh, I think I missed that. This is already not true for me because I have modelBuildingHandler = EpoxyAsyncUtil.getAsyncBackgroundHandler(), but I am not overriding RxPagedListBuilder.setNotifyScheduler. That might cause the problem right?

I'll try putting them on the same thread to see if that fixes the issue.

@rohandhruva
Copy link
Author

After using setNotifyScheduler(AndroidSchedulers.from(EpoxyAsyncUtil.getAsyncBackgroundHandler().looper)), I don't see any crashes. Might be worth also calling this out in the documentation.

Thank you so much for the debugging pointer, @yigit !

@eboudrant
Copy link
Contributor

Looks like this crash is not trivial to troubleshoot. May be the framework could fail-fast in case notify scheduler is not on the same thread than the model building ?

@elihart
Copy link
Contributor

elihart commented Oct 9, 2018

Thanks Yigit for the idea, glad that fixed it.

I can update the documentation to make this requirement clear. It would be great to auto detect this issue, but from what I can tell there is no public way for Epoxy to see what notifyExecutor is set on the paged list.

I'll reopen this as a reminder to make these changes

@elihart elihart reopened this Oct 9, 2018
@elihart elihart closed this as completed Oct 18, 2018
@kakai248
Copy link
Contributor

I'm still seeing this crash. My screen is a vertical paged list that comes from the db. I prepend an item that has a horizontal epoxy controller. The information of this inner controller comes from the server. To show these new server items, im passing the list to the main epoxy controller and then I do requestModelBuild().

I'm using the async epoxy handler everywhere except in the subscriber of the server call. This means that requestModelBuild() is called from the main thread. I thought this could be the reason so I tried EpoxyAsyncUtil.getAsyncBackgroundHandler().post { requestModelBuild() }. Doesn't solve it.

Any ideas? Thanks

@TonyTangAndroid
Copy link

For the record, I download the latest epoxy source code and run the epoxy-pagingsample app, I could stably reproduce this crash with the following stack trace by scrolling it fast before it generates 3000 records.

java.lang.NullPointerException: Attempt to invoke virtual method 'void com.airbnb.epoxy.EpoxyModel.addTo(com.airbnb.epoxy.EpoxyController)' on a null object reference
at com.airbnb.epoxy.EpoxyController.add(EpoxyController.java:468)
at com.airbnb.epoxy.paging.PagedListEpoxyController.addModels(PagedListEpoxyController.kt:81)
at com.airbnb.epoxy.pagingsample.TestController.addModels(PagingSampleActivity.kt:64)
at com.airbnb.epoxy.paging.PagedListEpoxyController.buildModels(PagedListEpoxyController.kt:73)
at com.airbnb.epoxy.EpoxyController$1.run(EpoxyController.java:268)
at android.os.Handler.handleCallback(Handler.java:873)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loop(Looper.java:193)
at android.os.HandlerThread.run(HandlerThread.java:65)

@TonyTangAndroid
Copy link

For these who is interested in checking out the fix using Rxjava 2, please refer to this pull request.
https://github.com/TonyTangAndroid/epoxy/pull/3/files

@elihart
Copy link
Contributor

elihart commented Nov 21, 2018

I noticed the sample issue too a little while back, looks like LivePagedListBuilder doesn't have a way to specify the notify scheduler. I'll change the sample to use the main thread instead for now so it doesn't crash.

Thanks for sharing the rxjava sample @TonyTangAndroid !

@TonyTangAndroid
Copy link

Yeah, that's the reason why I made the attempt to use RxJava to check out whether it will crash or not after applying the hint from @rohandhruva . And it indeed turns out to be good.

@TonyTangAndroid
Copy link

Wow~ the fix has already been merged into master. That's quick. Thank you for your quick response.

@kakai248
Copy link
Contributor

Everyone is saying the fix worked. But I'm still seeing this crash even using the same thread. I haven't been able to pinpoint the problem yet.

I'm using the Rx builder and using the epoxy async thread in both the builder and the two epoxy handlers (model build and diff). It's not frequent, but it happens.

@AdamMc331
Copy link
Contributor

I've run into this same problem, and what we believed it to be is because of how I handle network state.

I have two live data: one that has a state with my pagedlist of items, and another livedata that represents the network state.

Because submitList() could be called really close to my setNetworkState() method that requests a model build, I occasionally saw this crash. The closest thing I could attribute it to was a race condition.

I did some research and found requestDelayedModelBuild(), and I used that for my network state with a slight delay so that if the two things were called close together, they would still build without breaking.

Perhaps that's what @kakai248 is still seeing?

@kakai248
Copy link
Contributor

@AdamMc331 That's very likely as we only see crashes in controllers where we add additional models. However your proposed solution won't work for us as the additional models we add can't be delayed otherwise the screens get weird.

We are delaying the submitList instead. But this is just a band-aid. I'm not sure where to start to solve the race condition.

@AdamMc331
Copy link
Contributor

Yeah, I feel like our solution is a band aid too and isn't 💯 fool proof to preventing this race condition, but in practice and with some manual stress testing I haven't been able to break it. That's not to say it won't happen once it's out in the wild, though. :/

@kakai248
Copy link
Contributor

kakai248 commented Jan 9, 2019

https://drive.google.com/open?id=19j5cmY3jgYUWhPiVGvZEyec7LS1LDVTn

This was the best I could do to reproduce the issue. It happens sometimes (keep in mind it's a race condition). Keep closing and opening the app until it happens.

I notice that when it crashes I can see the first two gray items. I believe the delay(20, TimeUnit.MILLISECONDS) it's what's helping the crash happen more frequently.

Heads up: The code was adapted from another epoxy bug test project. It's quite confusing and doing weird things. The equals and hashCode implementations with Thread.sleep was a poor mans approach at delaying the diffing 😄

@yigit
Copy link
Contributor

yigit commented Jan 12, 2019

actually i added a log to addModels. It receives 1 call on the main thread.

Looks like it is happening because of this code:

 public void requestModelBuild() {
    if (isBuildingModels()) {
      throw new IllegalEpoxyUsage("Cannot call `requestModelBuild` from inside `buildModels`");
    }
    if (hasBuiltModelsEver) {
      requestDelayedModelBuild(0);
    } else {
      buildModelsRunnable.run();
    }
  }

It does not check the current thread before calling buildModels. @elihart is this intended?

2019-01-12 11:54:00.122 5473-5473/com.kakai.android.epoxyjumptest E/EX: [main] add models
    java.lang.Exception
        at com.kakai.android.epoxyjumptest.LogKt.log(Log.kt:6)
        at com.kakai.android.epoxyjumptest.TestController.addModels(MainActivity.kt:83)
        at com.airbnb.epoxy.paging.PagedListEpoxyController.buildModels(PagedListEpoxyController.kt:73)
        at com.airbnb.epoxy.EpoxyController$1.run(EpoxyController.java:268)
        at com.airbnb.epoxy.EpoxyController.requestModelBuild(EpoxyController.java:161)
        at com.kakai.android.epoxyjumptest.TestController.submitItem2List1(MainActivity.kt:119)
        at com.kakai.android.epoxyjumptest.MainActivity$onCreate$1.onChanged(MainActivity.kt:42)
        at com.kakai.android.epoxyjumptest.MainActivity$onCreate$1.onChanged(MainActivity.kt:25)
        at androidx.lifecycle.LiveData.considerNotify(LiveData.java:113)
        at androidx.lifecycle.LiveData.dispatchingValue(LiveData.java:131)
        at androidx.lifecycle.LiveData.setValue(LiveData.java:289)
        at androidx.lifecycle.MutableLiveData.setValue(MutableLiveData.java:33)
        at androidx.lifecycle.LiveData$1.run(LiveData.java:91)
        at android.os.Handler.handleCallback(Handler.java:873)
        at android.os.Handler.dispatchMessage(Handler.java:99)
        at android.os.Looper.loop(Looper.java:193)
        at android.app.ActivityThread.main(ActivityThread.java:6669)
        at java.lang.reflect.Method.invoke(Native Method)
        at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:493)
        at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:858)
2019-01-12 11:54:00.228 5473-5496/com.kakai.android.epoxyjumptest E/EX: [epoxy] add models
    java.lang.Exception
        at com.kakai.android.epoxyjumptest.LogKt.log(Log.kt:6)
        at com.kakai.android.epoxyjumptest.TestController.addModels(MainActivity.kt:83)
        at com.airbnb.epoxy.paging.PagedListEpoxyController.buildModels(PagedListEpoxyController.kt:73)
        at com.airbnb.epoxy.EpoxyController$1.run(EpoxyController.java:268)
        at android.os.Handler.handleCallback(Handler.java:873)
        at android.os.Handler.dispatchMessage(Handler.java:99)
        at android.os.Looper.loop(Looper.java:193)
        at android.os.HandlerThread.run(HandlerThread.java:65)
2019-01-12 11:54:00.632 5473-5496/com.kakai.android.epoxyjumptest E/AndroidRuntime: FATAL EXCEPTION: epoxy
    Process: com.kakai.android.epoxyjumptest, PID: 5473
    java.lang.NullPointerException: Attempt to invoke virtual method 'void com.airbnb.epoxy.EpoxyModel.addTo(com.airbnb.epoxy.EpoxyController)' on a null object reference
        at com.kakai.android.epoxyjumptest.TestController.addModels(MainActivity.kt:113)
        at com.airbnb.epoxy.paging.PagedListEpoxyController.buildModels(PagedListEpoxyController.kt:73)
        at com.airbnb.epoxy.EpoxyController$1.run(EpoxyController.java:268)
        at android.os.Handler.handleCallback(Handler.java:873)
        at android.os.Handler.dispatchMessage(Handler.java:99)
        at android.os.Looper.loop(Looper.java:193)
        at android.os.HandlerThread.run(HandlerThread.java:65)

@elihart
Copy link
Contributor

elihart commented Jan 13, 2019

@yigit thanks for pointing that out - it is intended for the first call to requestModelBuild to run on the main thread so that the UI can be shown asap. If it is delayed then state can not be restored (the recyclerview needs to have all of it's items added before onViewStateRestored so that scroll position can be restored correctly)

Beyond state restoration, doing the first model build synchronously prevents briefly flashing a white screen when the fragment is first displayed.

I had forgotten about this in the discussion of Paged List synchronization and NPE issues though - this does seem like it would require manual synchronization to solve like in (#656)

@yigit
Copy link
Contributor

yigit commented Jan 13, 2019

The problem is not the access to the model though. We build the model based on the PagedList provided by the async diff util. That PagedList updates its contents only on the notify thread (for thread safety).
On the epoxy side, we don't control that so even if we add locking for the model, PagedList will still be changing that backing list w/o acquiring the lock.
In other words, we can add the lock to block the notifications but the backing PagedList has already been changed. When PagedListModelCache#getOrBuildModel tries to access it, it will be accessing potentially changed data, risking for index out of bounds exceptions etc (especially because new paging lib has page unloading).

I'm thinking, maybe we can figure out a way to special case initial load and for other callbacks, we can consider checking the notify thread and if it is not the right looper, throwing a descriptive exception. This way, if the developer made a mistake of giving them different threads, they'll easily see the problem and fix it.

Another, more expensive, option is to copy the PagedList contents into the builder thread automatically as notifications arrive, with some message passing. It will be yet another copy of the list (and we already have 1 because of the models) but that would also make building use its own data structure that it can safely access.

wdyt? I like the first option better if we can figure out a way to special case initial call and ensure everything else will be on the right thread.

@elihart
Copy link
Contributor

elihart commented Jan 14, 2019

Thanks for explaining, I see the issues now.

I like your ideas for the first option. I updated #656 with a potential solution.

Basically it skips using the model cache when the looper isn't the expected one for model building.

Another edge case I noticed is that submitList can trigger a synchronous update callback which I think we also need to handle, and to do that I kept the @synchronization annotations on all methods that access the model cache

@elihart elihart reopened this Jan 14, 2019
@elihart
Copy link
Contributor

elihart commented Jan 22, 2019

This will hopefully be fixed with #656

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants