Support unaligned DMA buffer address #2704

yuyichao · 2025-03-17T18:57:39Z

ARTIQ Pull Request

Description of Changes

Split the base address into an aligned part and an offset. Add one cycle to the start of the DMA playback to skip ahead to the real start of the playback.
This makes it significantly easier to run DMA from a user provided buffer. (All the rest are easy compiler stuff).

Currently this is not tested at all, hence the draft status (I'm still figuring out how to setup the proper environment...) and I'm not even 100% sure if I'm using the correct syntax. However, I do want to post this to get some feedback (especially to see if I understands the handshake signals and the FSM correctly...). The main reasons I went for this rather than adding support to the compiler are that

I think this can be supported with minimum to no overhead.
This can actually make recording faster since the firmware doesn't have to realign the buffer anymore (not implemented in this PR yet).

Related Issue

Ref #2697.

Type of Changes

	Type
✓	✨ New feature

Steps (Choose relevant, delete irrelevant before submitting)

All Pull Requests

Use correct spelling and grammar.
Update RELEASE_NOTES.rst if there are noteworthy changes, especially if there are changes to existing APIs.
Close/update issues.
Check the copyright situation of your changes and sign off your patches (git commit --signoff, see copyright).

Code Changes

Run flake8 to check code style (follow PEP-8 style). flake8 has issues with parsing Migen/gateware code, ignore as necessary.
Test your changes or have someone test them. Mention what was tested and how.
Add and check docstrings and comments
Check, test, and update the unittests in /artiq/test/ or gateware simulations in /artiq/gateware/test

Git Logistics

Split your contribution into logically separate changes (git rebase --interactive). Merge/squash/fixup commits that just fix or amend previous commits. Remove unintended changes & cleanup. See tutorial.
Write short & meaningful commit messages. Review each commit for messages (git show). Format:
```
topic: description. < 50 characters total.

Longer description. < 70 characters per line
```

Licensing

See copyright & licensing for more info.
ARTIQ files that do not contain a license header are copyrighted by M-Labs Limited and are licensed under LGPLv3+.

Split the base address into an aligned part and an offset. Add one cycle to the start of the DMA playback to skip ahead to the real start of the playback.

sbourdeauducq · 2025-03-17T22:18:56Z

This makes it significantly easier to run DMA from a user provided buffer.

How so?

This can actually make recording faster since the firmware doesn't have to realign the buffer anymore

Why not just allocate an aligned buffer in the first place?

yuyichao · 2025-03-18T01:19:51Z

Why not just allocate an aligned buffer in the first place?

Well, that was my question when I looked at the firmware code. I just figured that either no one care about that performance or that there's no easy way to do so in rust.

How so?

The direct answer to this question is that this solves one of the three requirements I've listed in #2697 . Since it reduce the number of missing pieces from 3 to 2 this change obviously made it easier.

But I assume you are asking about comparing to other approaches. And here's a comparison for all the approaches/aspects I've considered.

First of all, I'm assuming the alignment requirement can be removed without significant performance hit so I'm only considering the software/firmware side. With that out of the way, the usecases I've considered are,

Normal user of rpc
Normal user of array allocated on kernel
Normal use of DMA
Use rpc result for DMA
Use array allocated on kernel for DMA

The first three are mainly to make sure there's no regression and the latter two are usecases I'd like to support (though I personally only need 4 for now). Now comparing the approaches with or without the alignment requirements.

If the DMA alignment requirement is to be kept, the main question is how to make 4 and 5 works.
- The solution that does not require any compiler change (in terms of alignment) is to over allocate the buffer and realign the pointer. This is basically how DMA buffers is allocated now normally. However, I feel like this is incredibly stupid since the LLVM stack allocation code is clearly capable to allocate a property aligned buffer and the buffer doesn't even need to be resized here.
- Now if we require the allocation to be properly aligned from the get go, AFAICT, the only solution that does not involve adding completely new metadata to the compiler is to create a type with 64bytes alignment (or bump max alignment to 64 for a lesser version that only support rpc). However, this would also mean that all dynamically sized rpc return allocation would be 64bytes aligned (again, without deeper change to the compiler) which could be a significant regression for people that returns a list of short arrays from RPC.
- Alternatively, if call/allocation site specific metadata can be added to make sure only specific rpc/array allocation gets 64 bytes alignment that should make sure there's no regression in normal code.
Now if the DMA alignment requirement is removed, then everything becomes significantly easier on the software side.
- There's no need for additional alignment, so no compiler changes necessary. No need for realignment copy. No need for site-specific metadata and no need for unnecessary over-allocation.
- In addition, DMA buffers (either the normal ones or user created ones from RPC or normal array allocation) can be stored more compactly, saving some memory for short traces.

sbourdeauducq · 2025-03-18T03:17:42Z

there's no easy way to do so in rust.

Obviously it can be done: allocate a buffer with extra size (alignment-1), then discard the beginning.

For a dynamic buffer like Vec then of course the reallocation has to be implemented manually perhaps with the principle above.

I don't see any valid reason to deal with misalignment in gateware. It wastes FPGA resources and potentially makes timing closure more difficult. When receiving a host-compiled DMA buffer, the device could simply allocate an aligned buffer to store the data coming from the network.

yuyichao · 2025-03-18T19:17:21Z

Obviously it can be done: allocate a buffer with extra size (alignment-1), then discard the beginning.
For a dynamic buffer like Vec then of course the reallocation has to be implemented manually perhaps with the principle above.

Sure. I did not write the code to do the extra realignment copy. I'm only commenting based on what you currently do and I'll take your word for what could be fixed.

I don't see any valid reason to deal with misalignment in gateware. It wastes FPGA resources and potentially makes timing closure more difficult.

So how much more resource would adding one more state to the FSM cost?

When receiving a host-compiled DMA buffer, the device could simply allocate an aligned buffer to store the data coming from the network.

I'm certainly fine with supporting this. However, this must be done without bumping the alignment for all rpc dynamic allocation. Ideally it should also support kernel side allocation as well though I don't need it so I don't care about it very much. If you have something in mind that's easier than touching gateware and can be supported in both the current compiler version as well as all future version I'll probably be fine to implement that.

Support unaligned DMA buffer address

a894f9a

Split the base address into an aligned part and an offset. Add one cycle to the start of the DMA playback to skip ahead to the real start of the playback.

yuyichao force-pushed the unaligned-dma branch from 2bb2629 to a894f9a Compare March 17, 2025 19:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support unaligned DMA buffer address #2704

Support unaligned DMA buffer address #2704

Uh oh!

yuyichao commented Mar 17, 2025 •

edited

Loading

Uh oh!

sbourdeauducq commented Mar 17, 2025

Uh oh!

yuyichao commented Mar 18, 2025

Uh oh!

sbourdeauducq commented Mar 18, 2025 •

edited

Loading

Uh oh!

yuyichao commented Mar 18, 2025

Uh oh!

Uh oh!

Support unaligned DMA buffer address #2704

Are you sure you want to change the base?

Support unaligned DMA buffer address #2704

Uh oh!

Conversation

yuyichao commented Mar 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ARTIQ Pull Request

Description of Changes

Related Issue

Type of Changes

Steps (Choose relevant, delete irrelevant before submitting)

All Pull Requests

Code Changes

Git Logistics

Licensing

Uh oh!

sbourdeauducq commented Mar 17, 2025

Uh oh!

yuyichao commented Mar 18, 2025

Uh oh!

sbourdeauducq commented Mar 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuyichao commented Mar 18, 2025

Uh oh!

Uh oh!

yuyichao commented Mar 17, 2025 •

edited

Loading

sbourdeauducq commented Mar 18, 2025 •

edited

Loading