-
Notifications
You must be signed in to change notification settings - Fork 1.2k
fix(dcd): Fixed race condition on device disconnect #3109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
TinyUSB does not provide any locking means to protect the DCD variables. This can lead to race conditions when the user is trying to submit a transfer while the device is being disconnected. This can cause the device to be in an inconsistent state, leading to a crash or undefined behavior. This commit adds a spin-lock to protect the DCD variables during device disconnect. Closes espressif/esp-idf#9691 Also reported in espressif/esp-usb#131
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use the osal_mutex instead https://github.com/hathach/tinyusb/blob/master/src/osal/osal_freertos.h#L152, there is an osal API for that ?
I decided for critical section instead of OS primitives because many of these sections are in ISRs
|
thank you, that make sense. I think it is about time to add osal_critical_enter/exit() for this purpose. Since I cannot push to your fork, here is how it would be used. 53a83e1 . Let me know if this makes sense to you. |
@@ -989,8 +1011,10 @@ void dcd_int_handler(uint8_t rhport) { | |||
|
|||
if (gintsts & GINTSTS_USBRST) { | |||
// USBRST is start of reset. | |||
DCD_ENTER_CRITICAL(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is only relevant on SMP mcu like P4 right ? since when app enter critical, this isr shoud not be running ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the same race condition can happen on single core MCUs too.
For example
- Code from main app is modifying
xfer_ctl_t
and is interrupted - Content of
xfer_ctl_t
is now inconsistent - The ISR modifies
xfer_ctl_t
again - Main app continues
In steps 3 and 4 we can access inconsistent xfer_ctl_t
which can lead to undefined behavior (division by zero in the cases I've seen)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but when application modify xfer_ctl_t it already disable irq with enter_critical() the irq shouldn't be invoked. since usb irq is the only irq making change to this xfer ctl, it isn't be preempted by other irq. I guess it would be safe for signle core CPU ? I am trying to see if we could optional omit this for s2/s3 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right. Although S3 is also dual-core, and the problem was reported and reproduced on S3 too.
EDIT: Of course, this comment applies only for critical section in dcd_int_handler()
. Other accesses to xfer_ctl_t
must still be protected in single CPU systems
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for confirmation, we can add an macro to only include this in multi core CPU. since many dwc2 stm32 is single core only. I move the critical API to osal layer here #3127 let me know what you think
ok I made more changes to #3127 to have osal spinlock (SMP), which I think better name than critical section (single core). Would you mind testing it to see if it fix the race condition. In the future we can move the spinlock to usbd to make sure it works with other ports as well. PS: spinlock is moved to usbd, all driver can use usbd_spin_lock/unlock() for convenience, it will also be used for class driver later as wel. Note: when called in_isr with single core CPU, osal implementation just do nothing since it has no chance to conflict in that case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you very much, this PR initiate a new osal spin lock API, which is implemented and merged #3127, which should fix this race issue on esp32 chip. I did a rebase there and github couldn't detect and mark this as merged. So we will need to close this manually.
PS: I need to wrap this up to use spinlock API to replace atomic to fix an linking issue with arduino framework adafruit/Adafruit_TinyUSB_Arduino#513 . Feel free to let me know if your testing still has issue with race condition.
TinyUSB does not provide any locking means to protect the DCD variables. This can lead to race conditions when the user is trying to submit a transfer while the device is being disconnected. This can cause the device to be in an inconsistent state, leading to a crash or undefined behavior.
This commit adds a spin-lock to protect the DCD variables during device disconnect.