Skip to content

[trackerm/p2] Fix USART/DMA deadlock #2603

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 16, 2022

Conversation

scott-brust
Copy link
Member

@scott-brust scott-brust commented Dec 15, 2022

Problem

TrackerM devices eventually deadlock after running for a few hours. The issue was traced to the cellular USART RX DMA transfer. On some transfers, the DMA controller does not report the final block of bytes was transferred from the USART to memory. As a result, we wait in a loop that will never complete.

Solution

The solution is to workaround the DMA controller behavior by checking if the RX DMA transfer channel becomes disabled while waiting for the transfer to complete. According to the data sheet in these circumstances the DMA transfer has actually completed and the data should be available. It should be safe to continue with the assumption that we have not lost any data from the modem. So far this appears to be true, but will need additional testing to explicitly confirm all bytes are accounted for.

Steps to Test

Run tracker edge app with this branch
or run serial_loopback2 with SERIAL_06_p2_serial2_stress_test test
loopback.cpp.txt

Example App

I modified serial_loopback2 to test a DMA transfer with 32 byte usart buffers (ie DMA block sizes).
I intentionally send 30 bytes of data, which is just less than a block size.
I can see that the final 2 bytes are not transferred until we call flushDmaRxFiFo()
HOWEVER, I havent confirmed that when we dont see GDMA_GetDstAddr() increment as expected that when we break that the bytes will be there. Working on that next....

.References

Log output of loopback test sending 30 bytes and reading it out via DMA, byte by byte and logging full RX buffer each time

Running tests
Avail: 30 SerialAvailable: 28.    <---- Serial.available() calls data() and based of DMA addr, we have only xfered 28 of 30 total bytes at this time! I think Serial.flush() is not working correctly on P2/Trackerm in addition to DMA addresses being short at the end of the block. 
00 01 02 03 04
05 06 07 08 09
0A 0B 0C 0D 0E
0F 10 11 12 13
14 15 16 17 18
19 1A 1B FF FF
Avail: 29 SerialAvailable: 29. <---- Subsequent poll of data() shows DMA xfer should be complete, but our final 2 bytes are not present
00 01 02 03 04
05 06 07 08 09
0A 0B 0C 0D 0E
0F 10 11 12 13
14 15 16 17 18
19 1A 1B FF FF
Avail: 28 SerialAvailable: 28
00 01 02 03 04
05 06 07 08 09
0A 0B 0C 0D 0E
0F 10 11 12 13
14 15 16 17 18
19 1A 1B FF FF
...... SNIP........
Avail: 2 SerialAvailable: 2
00 01 02 03 04
05 06 07 08 09
0A 0B 0C 0D 0E
0F 10 11 12 13
14 15 16 17 18
19 1A 1B FF FF
Avail: 1 SerialAvailable: 1
00 01 02 03 04
05 06 07 08 09
0A 0B 0C 0D 0E
0F 10 11 12 13
14 15 16 17 18
19 1A 1B 1C 1D        <---- The final 2 bytes appear when we call flushDmaRxFiFo()
Avail: 0 SerialAvailable: 0
00 01 02 03 04
05 06 07 08 09
0A 0B 0C 0D 0E
0F 10 11 12 13
14 15 16 17 18
19 1A 1B 1C 1D
Test SERIAL_06_p2_serial2_stress_test passed.
Test summary: 1 passed, 0 failed, and 0 skipped, out of 1 test(s).

See actual useful datasheet for DMA controller documentation
Infineon-xmc4100_xmc4200_rm_v1.6_2016-UM-v01_06-EN.pdf


Completeness

  • User is totes amazing for contributing!
  • Contributor has signed CLA (Info here)
  • Problem and Solution clearly stated
  • Run unit/integration/application tests on device
  • Added documentation
  • Added to CHANGELOG.md after merging (add links to docs and issues)

@scott-brust scott-brust requested review from XuGuohui and avtolstoy and removed request for XuGuohui December 15, 2022 01:04
@scott-brust scott-brust force-pushed the fix/sc-112840/usart_dma_lockup branch from 28b319d to c359f83 Compare December 15, 2022 01:13
while (GDMA_GetDstAddr(rxDmaInitStruct_.GDMA_Index, rxDmaInitStruct_.GDMA_ChNum) < expectedDstAddr) {
// XXX: spin around, this should be pretty fast
while (GDMA_GetDstAddr(rxDmaInitStruct_.GDMA_Index, rxDmaInitStruct_.GDMA_ChNum) < expectedDstAddr &&
(GDMA_BASE->ChEnReg & (1 << rxDmaInitStruct_.GDMA_ChNum))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't it lose bytes? for example, if we re goinng to consum 3 more bytes according to RX_BYTE_CNT, but here we just ignore that if the channel is closed, will the 3 bytes get missed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assumption is that the bytes have already been transferred.
We enter this loop with knowing that the UART reported RX_BYTE_CNT should be processed by the DMA controller. For unknown reasons, sometimes GDMA_GetDstAddr() does not increment by this count. In those cases we look at this bit.
According to the datasheet, this bit is cleared by the DMA controller when the transfer is complete. So if this is complete we assume that the data must be present, regardless of if GDMA_GetDstAddr() indicates the correct address:
Screen Shot 2022-12-15 at 1 12 41 PM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add some comments then, in case of forgetting why we're doing in this way over time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's definitely add some comments here.

@scott-brust scott-brust marked this pull request as ready for review December 15, 2022 21:16
Copy link
Member

@avtolstoy avtolstoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make sure to add a test app or a new test case for serial_loopback2 to make sure that we are not losing any data.

while (GDMA_GetDstAddr(rxDmaInitStruct_.GDMA_Index, rxDmaInitStruct_.GDMA_ChNum) < expectedDstAddr) {
// XXX: spin around, this should be pretty fast
while (GDMA_GetDstAddr(rxDmaInitStruct_.GDMA_Index, rxDmaInitStruct_.GDMA_ChNum) < expectedDstAddr &&
(GDMA_BASE->ChEnReg & (1 << rxDmaInitStruct_.GDMA_ChNum))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's definitely add some comments here.

@scott-brust scott-brust merged commit e8f7600 into develop Dec 16, 2022
@scott-brust scott-brust deleted the fix/sc-112840/usart_dma_lockup branch December 16, 2022 18:10
@scott-brust scott-brust added this to the 5.2.0 milestone Dec 16, 2022
@scott-brust scott-brust added 5.x-only and removed 5.x labels Feb 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants