Skip to content

Support unaligned DMA buffer address #2704

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yuyichao
Copy link
Contributor

@yuyichao yuyichao commented Mar 17, 2025

ARTIQ Pull Request

Description of Changes

Split the base address into an aligned part and an offset. Add one cycle to the start of the DMA playback to skip ahead to the real start of the playback.
This makes it significantly easier to run DMA from a user provided buffer. (All the rest are easy compiler stuff).

Currently this is not tested at all, hence the draft status (I'm still figuring out how to setup the proper environment...) and I'm not even 100% sure if I'm using the correct syntax. However, I do want to post this to get some feedback (especially to see if I understands the handshake signals and the FSM correctly...). The main reasons I went for this rather than adding support to the compiler are that

  1. I think this can be supported with minimum to no overhead.
  2. This can actually make recording faster since the firmware doesn't have to realign the buffer anymore (not implemented in this PR yet).

Related Issue

Ref #2697.

Type of Changes

Type
✨ New feature

Steps (Choose relevant, delete irrelevant before submitting)

All Pull Requests

  • Use correct spelling and grammar.
  • Update RELEASE_NOTES.rst if there are noteworthy changes, especially if there are changes to existing APIs.
  • Close/update issues.
  • Check the copyright situation of your changes and sign off your patches (git commit --signoff, see copyright).

Code Changes

  • Run flake8 to check code style (follow PEP-8 style). flake8 has issues with parsing Migen/gateware code, ignore as necessary.
  • Test your changes or have someone test them. Mention what was tested and how.
  • Add and check docstrings and comments
  • Check, test, and update the unittests in /artiq/test/ or gateware simulations in /artiq/gateware/test

Git Logistics

  • Split your contribution into logically separate changes (git rebase --interactive). Merge/squash/fixup commits that just fix or amend previous commits. Remove unintended changes & cleanup. See tutorial.
  • Write short & meaningful commit messages. Review each commit for messages (git show). Format:
    topic: description. < 50 characters total.
    
    Longer description. < 70 characters per line
    

Licensing

See copyright & licensing for more info.
ARTIQ files that do not contain a license header are copyrighted by M-Labs Limited and are licensed under LGPLv3+.

Split the base address into an aligned part and an offset.
Add one cycle to the start of the DMA playback to skip ahead
to the real start of the playback.
@sbourdeauducq
Copy link
Member

This makes it significantly easier to run DMA from a user provided buffer.

How so?

This can actually make recording faster since the firmware doesn't have to realign the buffer anymore

Why not just allocate an aligned buffer in the first place?

@yuyichao
Copy link
Contributor Author

Why not just allocate an aligned buffer in the first place?

Well, that was my question when I looked at the firmware code. I just figured that either no one care about that performance or that there's no easy way to do so in rust.

How so?

The direct answer to this question is that this solves one of the three requirements I've listed in #2697 . Since it reduce the number of missing pieces from 3 to 2 this change obviously made it easier.

But I assume you are asking about comparing to other approaches. And here's a comparison for all the approaches/aspects I've considered.

First of all, I'm assuming the alignment requirement can be removed without significant performance hit so I'm only considering the software/firmware side. With that out of the way, the usecases I've considered are,

  1. Normal user of rpc
  2. Normal user of array allocated on kernel
  3. Normal use of DMA
  4. Use rpc result for DMA
  5. Use array allocated on kernel for DMA

The first three are mainly to make sure there's no regression and the latter two are usecases I'd like to support (though I personally only need 4 for now). Now comparing the approaches with or without the alignment requirements.

  • If the DMA alignment requirement is to be kept, the main question is how to make 4 and 5 works.

    • The solution that does not require any compiler change (in terms of alignment) is to over allocate the buffer and realign the pointer. This is basically how DMA buffers is allocated now normally. However, I feel like this is incredibly stupid since the LLVM stack allocation code is clearly capable to allocate a property aligned buffer and the buffer doesn't even need to be resized here.
    • Now if we require the allocation to be properly aligned from the get go, AFAICT, the only solution that does not involve adding completely new metadata to the compiler is to create a type with 64bytes alignment (or bump max alignment to 64 for a lesser version that only support rpc). However, this would also mean that all dynamically sized rpc return allocation would be 64bytes aligned (again, without deeper change to the compiler) which could be a significant regression for people that returns a list of short arrays from RPC.
    • Alternatively, if call/allocation site specific metadata can be added to make sure only specific rpc/array allocation gets 64 bytes alignment that should make sure there's no regression in normal code.
  • Now if the DMA alignment requirement is removed, then everything becomes significantly easier on the software side.

    • There's no need for additional alignment, so no compiler changes necessary. No need for realignment copy. No need for site-specific metadata and no need for unnecessary over-allocation.
    • In addition, DMA buffers (either the normal ones or user created ones from RPC or normal array allocation) can be stored more compactly, saving some memory for short traces.

@sbourdeauducq
Copy link
Member

sbourdeauducq commented Mar 18, 2025

there's no easy way to do so in rust.

Obviously it can be done: allocate a buffer with extra size (alignment-1), then discard the beginning.

For a dynamic buffer like Vec then of course the reallocation has to be implemented manually perhaps with the principle above.

I don't see any valid reason to deal with misalignment in gateware. It wastes FPGA resources and potentially makes timing closure more difficult. When receiving a host-compiled DMA buffer, the device could simply allocate an aligned buffer to store the data coming from the network.

@yuyichao
Copy link
Contributor Author

Obviously it can be done: allocate a buffer with extra size (alignment-1), then discard the beginning.
For a dynamic buffer like Vec then of course the reallocation has to be implemented manually perhaps with the principle above.

Sure. I did not write the code to do the extra realignment copy. I'm only commenting based on what you currently do and I'll take your word for what could be fixed.

I don't see any valid reason to deal with misalignment in gateware. It wastes FPGA resources and potentially makes timing closure more difficult.

So how much more resource would adding one more state to the FSM cost?

When receiving a host-compiled DMA buffer, the device could simply allocate an aligned buffer to store the data coming from the network.

I'm certainly fine with supporting this. However, this must be done without bumping the alignment for all rpc dynamic allocation. Ideally it should also support kernel side allocation as well though I don't need it so I don't care about it very much. If you have something in mind that's easier than touching gateware and can be supported in both the current compiler version as well as all future version I'll probably be fine to implement that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants