Skip to content

Wishbone burst access #693

Open
Open
@stokdam

Description

@stokdam

I'm opening this issue since I have not found any dedicated thread on this topic.
The topic has been discussed here #573 (comment)

I'm going to synthesize it in the following lines.

As stated by @stnolting in #573 (comment), bus accesses are currently the main performance bottlneck of the architecture. This is because any request will be asserted only after the previuos one has been acknowledged. Especially with high latency memory, this behavior is crushing performances. For example SDRAMs have a certains CAS latency, which is added to each bus transaction. However, most SDRAMs offer burst accesses, which means that you must wait CL only once for the entire burst (e.g. 8 words), then you receive 1 word per clock cycle. This behavior suits well with caches, in particular with instruction cache.

@NikLeberg, in #573 (reply in thread), proposed two ways in which burst accesses can be implemented:

  • Registered feedback mode:

CTI_IO() cycle time identifier and BTE_IO() burst type extension. In CTI a value of 000 indicates a classic access whereas 001 means constant address burst and 010 means incrementing address burst. The BTE would only be used for the incementing burst to indicate after how many beats the burst should wrap around to the start address.

The advantage of this approach is that wrapping is actually supported by SDRAM bursts. It can be useful to reduce first word access latency after a miss. For example, if address 0x5 is required, you start an 8-word burst access at address 0x5; after receiving address 0x7, the burst wraps around and sends you address 0x0, until 0x4 which is the last one. This allows the icache to always perform bursts that are aligned with cache blocks, while still minimizing miss penalty. The disadvantage is the necessity to implement the above-mentioned control signals.

  • Using the real pipelined mode:

Instead of asserting STB basically identical to CYC and requiring that the address and data bits do not change, on each clock where STB is asserted, new valid data and address are presented on the bus. The slave is expected to buffer them internally (if he needs wait states) and respond asynchronously with ACKs for each cycle. But now the slave also gets a new output signal STALL with which he tells the master that the current STB cycle will be ignored and the STB must be repeated/kept.
Slaves could always assume an incrementing access and just stall the master if the address is not the expected increment (and load the correct value on the next clock).

This approach may be easier to implement on the CPU side.

Metadata

Metadata

Assignees

No one assigned

    Labels

    HWHardware-relatedenhancementNew feature or requeststaleNo updates for a long time

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions