Skip to content

fix(dcd): Fixed race condition on device disconnect #3109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

tore-espressif
Copy link
Contributor

@tore-espressif tore-espressif commented May 2, 2025

TinyUSB does not provide any locking means to protect the DCD variables. This can lead to race conditions when the user is trying to submit a transfer while the device is being disconnected. This can cause the device to be in an inconsistent state, leading to a crash or undefined behavior.

This commit adds a spin-lock to protect the DCD variables during device disconnect.

TinyUSB does not provide any locking means to protect the DCD variables.
This can lead to race conditions when the user is trying to submit
a transfer while the device is being disconnected. This can cause
the device to be in an inconsistent state, leading to a crash or
undefined behavior.

This commit adds a spin-lock to protect the DCD variables during
device disconnect.

Closes espressif/esp-idf#9691
Also reported in espressif/esp-usb#131
Copy link
Owner

@hathach hathach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use the osal_mutex instead https://github.com/hathach/tinyusb/blob/master/src/osal/osal_freertos.h#L152, there is an osal API for that ?

@tore-espressif
Copy link
Contributor Author

can we use the osal_mutex instead https://github.com/hathach/tinyusb/blob/master/src/osal/osal_freertos.h#L152, there is an osal API for that ?

I decided for critical section instead of OS primitives because many of these sections are in ISRs

  • ISR must not block. So calling osal_mutex_lock() is not acceptable
  • With critical section we get minimal latency and overhead
  • Intention and simplicity: We really just need an atomic access to xfer_ctl_t.

@hathach
Copy link
Owner

hathach commented May 16, 2025

thank you, that make sense. I think it is about time to add osal_critical_enter/exit() for this purpose. Since I cannot push to your fork, here is how it would be used. 53a83e1 . Let me know if this makes sense to you.

@@ -989,8 +1011,10 @@ void dcd_int_handler(uint8_t rhport) {

if (gintsts & GINTSTS_USBRST) {
// USBRST is start of reset.
DCD_ENTER_CRITICAL();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is only relevant on SMP mcu like P4 right ? since when app enter critical, this isr shoud not be running ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the same race condition can happen on single core MCUs too.
For example

  1. Code from main app is modifying xfer_ctl_t and is interrupted
  2. Content of xfer_ctl_t is now inconsistent
  3. The ISR modifies xfer_ctl_t again
  4. Main app continues

In steps 3 and 4 we can access inconsistent xfer_ctl_t which can lead to undefined behavior (division by zero in the cases I've seen)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but when application modify xfer_ctl_t it already disable irq with enter_critical() the irq shouldn't be invoked. since usb irq is the only irq making change to this xfer ctl, it isn't be preempted by other irq. I guess it would be safe for signle core CPU ? I am trying to see if we could optional omit this for s2/s3 ?

Copy link
Contributor Author

@tore-espressif tore-espressif May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. Although S3 is also dual-core, and the problem was reported and reproduced on S3 too.

EDIT: Of course, this comment applies only for critical section in dcd_int_handler(). Other accesses to xfer_ctl_t must still be protected in single CPU systems

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for confirmation, we can add an macro to only include this in multi core CPU. since many dwc2 stm32 is single core only. I move the critical API to osal layer here #3127 let me know what you think

@hathach
Copy link
Owner

hathach commented May 20, 2025

ok I made more changes to #3127 to have osal spinlock (SMP), which I think better name than critical section (single core). Would you mind testing it to see if it fix the race condition. In the future we can move the spinlock to usbd to make sure it works with other ports as well.

PS: spinlock is moved to usbd, all driver can use usbd_spin_lock/unlock() for convenience, it will also be used for class driver later as wel. Note: when called in_isr with single core CPU, osal implementation just do nothing since it has no chance to conflict in that case.

Copy link
Owner

@hathach hathach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you very much, this PR initiate a new osal spin lock API, which is implemented and merged #3127, which should fix this race issue on esp32 chip. I did a rebase there and github couldn't detect and mark this as merged. So we will need to close this manually.

PS: I need to wrap this up to use spinlock API to replace atomic to fix an linking issue with arduino framework adafruit/Adafruit_TinyUSB_Arduino#513 . Feel free to let me know if your testing still has issue with race condition.

@hathach hathach closed this May 21, 2025
@tore-espressif tore-espressif deleted the fix/dcd_race_condition branch May 23, 2025 09:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants