Skip to content

CachePlugin with wayCount=1 gets eventually stuck loading the same data #131

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
7FM opened this issue Feb 4, 2025 · 12 comments
Open

CachePlugin with wayCount=1 gets eventually stuck loading the same data #131

7FM opened this issue Feb 4, 2025 · 12 comments

Comments

@7FM
Copy link

7FM commented Feb 4, 2025

I am using a patched version of the NaxAsicGen target (mainly to generate AXI memory interfaces and to change memDataWidth and fetchDataWidth to 32 bit).
Note that I am using the inferred ramBlocks.
I tried running a simple FreeRTOS binary on the core and noticed that the core was getting stuck during two load word instructions:

Image

Essentially, the redo bit is never cleared and the core keeps sending read requests for the same two addresses in the load queue.

If I set the wayCount=2 for both FetchCachePlugin and DataCachePlugin, the core is no longer stuck and continues to execute just fine.
(I have not tested if it is actually necessary to change the wayCount for both plugins)
Thus, I suspect that there must be some bug that only shows up with wayCount=1.

@Dolu1990
Copy link
Member

Dolu1990 commented Feb 4, 2025

Hi,

It isn't a hardware bug, but more a by design dead lock.
So, currently, a cache line refill has no "grace periode" during which it is protected from removal, so, if you have 2 memory access on the same set but on different memory blocks (and both miss), there is the change that it enter the regime you observe in the wave.

Fondamentaly, what would be needed to avoid that, is to update the L1 D$ refill slots to incorporate some cycles of locking once the refill is done. Around there :

val slots = for (refillId <- 0 until refillCount) yield new Area {

I would say, use at least 2 ways, as 1 is just a performance killer.

@7FM
Copy link
Author

7FM commented Feb 4, 2025

Thanks for the fast reply!

I would say, use at least 2 ways, as 1 is just a performance killer.

Not really a conscious choice I made, but rather the current default:

case p: FetchCachePlugin => p.wayCount = 1; p.cacheSize = 256; p.memDataWidth = 64
case p: DataCachePlugin => p.wayCount = 1; p.cacheSize = 256; p.memDataWidth = 64

@Dolu1990
Copy link
Member

Dolu1990 commented Feb 6, 2025

Hoo right. It was to keep things as small as possible for P&R trials.
Hmm overall, the number of ways will not realy increase the area of the design.

I can just change the default to 2 as a default, ok for you ?

@7FM
Copy link
Author

7FM commented Feb 10, 2025

Wouldn't it still be possible to face the same problem with more than wayCount memory accesses on the same set regardless of the chosen value?

@Dolu1990
Copy link
Member

I think there is a limit into it, which is the latency between a refill completion and the retry.
If you sent me your waveform i can check it.

@7FM
Copy link
Author

7FM commented Feb 12, 2025

Sure, I attached the complete simulation project and a GTKwave save file positioned roughly at the time when the redo bit is no longer cleared.

nax_debug_1.gtkw.txt
run_6.tar.gz

I am using this patch Nax5.patch.txt and the following command: sbt "runMain naxriscv.platform.asic.NaxAsicGen --memory-region=0x40000000,0x010000,io,p --memory-region=0x80200000,0x100000,io,p --memory-region=0x80000000,0x100000,xc,m --memory-region=0x80100000,0x100000,rwc,m --reset-vector=0x80000000" to generate the nax.v.
To run the simulation: cd run_6/sim && make sim

Let me know if you need more info to reproduce the issue.

@Dolu1990
Copy link
Member

Thanks ^^

Hmm make =>
%Error: Invalid option: --no-timing

@7FM
Copy link
Author

7FM commented Feb 17, 2025

I guess you are using a too old verilator version. You can probably just remove line 22 from the makefile which adds this argument

@Dolu1990
Copy link
Member

Right ^^
Got upstream verilator.

Then make =>

     -.--ns INFO     gpi                                ..mbed/gpi_embed.cpp:76   in set_program_name_in_venv        Did not detect Python virtual environment. Using system-wide Python interpreter
     -.--ns INFO     gpi                                ../gpi/GpiCommon.cpp:101  in gpi_print_registered_impl       VPI registered
     0.00ns INFO     cocotb                             Running on Verilator version 5.033 devel
     0.00ns INFO     cocotb                             Running tests with cocotb v1.8.0 from /home/rawrr/.local/lib/python3.10/site-packages/cocotb
     0.00ns INFO     cocotb                             Seeding Python random module with 1739804701
     0.00ns INFO     cocotb.regression                  Found test test_default.run_test
     0.00ns INFO     cocotb.regression                  running run_test (1/1)
     0.00ns INFO     cocotb.nax_wrapper                 Setting rst
Vtop: /usr/local/share/verilator/include/verilated_fst_c.cpp:184: void VerilatedFst::declare(uint32_t, const char*, int, VerilatedTraceSigDirection, VerilatedTraceSigKind, VerilatedTraceSigType, bool, int, bool, int, int): Assertion `hierarchicalName.rfind(' ') != std::string::npos' failed.
make[1]: *** [/home/rawrr/.local/lib/python3.10/site-packages/cocotb/share/makefiles/simulators/Makefile.verilator:65: results.xml] Aborted (core dumped)
make[1]: Leaving directory '/media/data2/download/run_6/sim'
make: *** [/home/rawrr/.local/lib/python3.10/site-packages/cocotb/share/makefiles/Makefile.inc:40: sim] Error 2

Hmmm maybe it is tooo upstream XD
What version do you have exactly ?
Or else, can you sent a fst file ?

@7FM
Copy link
Author

7FM commented Feb 17, 2025

Ah yes, that's a known upstream issue. As a hacky workaround, I am applying this patch to remove the assertion XD
The exact commit I am using is f4a01eb4525f23ee6ad8b6a4f17535a45adcea61 and applying aforementioned patch.

Another workaround is to let verilator generate a vcd trace instead, though this is going to be a very very large file.
Simply remove --trace-fst from the Makefile.

IIRC, the attached archive should already contain a dump.fst in the simulation folder.

@Dolu1990
Copy link
Member

IIRC, the attached archive should already contain a dump.fst in the simulation folder.

Hoooo right, sorry, i missed it XD

@Dolu1990
Copy link
Member

So looking at the wave.
The reason why this happens is because the wakeup of the LSU's LQ is pipelined and comes 2 cycles after the refill is done, allowing up to 2 instruction to trigger a refill before it can do forward progress.
So. If you have 4 ways of cache, you will be safe from it.

A ideal fix would be to prevent a cache line recently loaded to be unloaded so soon. That could be done by extending the refill slots in the DataCache design.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants