-
Notifications
You must be signed in to change notification settings - Fork 21
Accelerate STM32 QSPI read/write operations by DMA #456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Support all STM32 Families with QSPI/OSPI HAL API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work on this! Really appreciate this getting added! Just a few comments here and there.
/// Mapping from QSPI/OSPI index to DMA link info | ||
#if defined(OCTOSPI1) | ||
static const DMALinkInfo OSPIDMALinks[] = { | ||
{4, 0, MDMA_REQUEST_OCTOSPI1_FIFO_TH}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DMA 4 means MDMA, right? Could we define this as a constant in stm_dma_utils.h?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I think I can define the constant as MDMA_IDX.
stm_dma_ip_v1.h seems to be a better place to put the constant in, because MDMA is only available in STM32H7 and STM32H7 is treated as a special case of DMA IP v1.
} | ||
else { | ||
// wait until transfer complete or timeout | ||
while(obj->handle.State == HAL_OSPI_STATE_BUSY_TX); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder, is it possible to do this wait in a non spinlocking way, so other threads can execute? Like, you could create a semaphore, give it from the transfer complete callback, and wait on it here
obj->handle.State = HAL_OSPI_STATE_READY; | ||
} | ||
#if defined (__DCACHE_PRESENT) && (__DCACHE_PRESENT == 1U) | ||
SCB_InvalidateDCache_by_Addr((uint32_t*)data, *length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to be careful with this. If the data buffer is not cache aligned, this can actually be a destructive oprtation, because it will clobber the values of any other variables within the cache blocks in question. This actually happened to me during testing of DMA SPI, and it caused a HardFault.
The only way to do this truly safely is for the data buffer to start on a cache line and have a size that's an integer number of cache lines. You'd probably have to allocate a new buffer on the heap, recieve into it, and then copy into the user supplied buffer. Or, alternately, you could do this at the QSPI class layer by changing the read operation to use a CacheAlignedBuffer class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cleaned the cache before DMA reception, so it would be safe if other variables within the cache block are not modified by other threads or interrupt handlers during DMA reception. Yes it's not 100% safe, but allocating additional memory or changing the public interface of QSPI class also have their own drawbacks. I'll try to split the read buffer into cache-aligned parts and the cache-unaligned parts, and use the non-DMA read function on the unaligned parts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah... it's just a little risky in my book the way it is now because you can't prove that another thread or ISR isn't using memory in those blocks while this code is running
dmaHandle->Init.Endianness = MDMA_LITTLE_ENDIANNESS_PRESERVE; | ||
|
||
dmaHandle->Init.SourceInc = periphInc ? MDMA_SRC_INC_BYTE: MDMA_SRC_INC_DISABLE; | ||
dmaHandle->Init.DestinationInc = memInc ? MDMA_DEST_INC_BYTE: MDMA_DEST_INC_DISABLE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to consider periphInc or the mem/periph alignment here?
dmaLink = &OSPIDMALinks[0]; | ||
#endif | ||
// Initialize DMA channel | ||
DMAHandlePointer dmaHandle = stm_init_dma_link(dmaLink, DMA_PERIPH_TO_MEMORY, false, true, 1, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... This DMA is actually used for mem to periph and periph to mem transfers... does that mean the direction argument is meaningless for MDMA? I feel like this may need a little refactoring later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MDMA does not distinguish between memory and peripherals. The source and destination addresses can be both memory and peripheral addresses.
Summary of changes
Added STM32H7 MDMA support to stm32_dma_utils.
Faster QSPI read/write speed using DMA.
The acceleration ratio is between 5x and 10x, depending on the microcontroller chip.
No modification is needed on user code.
This is suitable for filesystem access on QSPI Flash chips, or for driving some LCDs with QSPI interface.
Tested on at least one board of the following STM32 Families: F4, F7, L4, L4+, L5, G4, WB, U5, H7 with QSPI, H7 with OSPI.
Impact of changes
No.
Migration actions required
No.
Documentation
The QSPI Flash uses one DMA channel on the microcontroller.
There's no confict with existing SPI DMA links.
Some sequential read performance data:
(NUCLEO boards are tested with an external W25Q128FV module)
Pull request type
Test results
All boards are tested by filling the flash with "55 AA" pattern and then read out.
Discovery boards and custom boards are additionally tested by creating a FAT filesystem on the QSPI Flash and writing a 3.1MB binary file, then read it out and check the CRC.