Skip to content

Commit 3ac6303

Browse files
authored
Merge pull request #264 from stnolting/zxcfu_isa_extension
✨[Zxcfu ISA ext.] add option to implement custom RISC-V instructions
2 parents 729203b + 4382a29 commit 3ac6303

35 files changed

+993
-64
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ defined by the `hw_version_c` constant in the main VHDL package file [`rtl/core/
2626

2727
| Date (*dd.mm.yyyy*) | Version | Comment |
2828
|:----------:|:-------:|:--------|
29+
| 30.01.2022 | 1.6.7.1 | :sparkles: added **`Zxcfu` ISA extension for user-defined custom RISC-V instructions**; see [PR #264](https://github.com/stnolting/neorv32/pull/264) |
2930
| 28.01.2022 |[**:rocket:1.6.7**](https://github.com/stnolting/neorv32/releases/tag/v1.6.7) | **New release** |
3031
| 28.01.2022 | 1.6.6.10 | :bug: fixed bug in **bit-manipulation co-processor**: decoding collision between `cpop` and `rol` instructions; :bug: fixed bug in co-processor arbitration when an illegal instruction is detected; added four additional (yet unused) **CPU** co-processor slots; [PR #262](https://github.com/stnolting/neorv32/pull/262) |
3132
| 27.01.2022 | 1.6.6.9 | reworked **CFS** "user" logic; added CFS demo program; see [PR #261](https://github.com/stnolting/neorv32/pull/261) |

README.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -127,7 +127,9 @@ the "Minimal RISC-V Debug Specification Version 0.13.2" and compatible with **Op
127127
* _true random_ number generator ([TRNG](https://stnolting.github.io/neorv32/#_true_random_number_generator_trng))
128128
* execute in place module ([XIP](https://stnolting.github.io/neorv32/#_execute_in_place_module_xip)) to directly execute code from SPI flash
129129
* custom functions subsystem ([CFS](https://stnolting.github.io/neorv32/#_custom_functions_subsystem_cfs))
130-
for tightly-coupled custom co-processor extensions and interfaces
130+
for tightly-coupled custom accelerators and interfaces
131+
* custom functions unit ([CFU](https://stnolting.github.io/neorv32/#_custom_functions_unit_cfu)) for up to 1024
132+
_custom RISC-V instructions_
131133

132134
[[back to top](#The-NEORV32-RISC-V-Processor)]
133135

@@ -187,6 +189,7 @@ documentation section).
187189
[[`Zihpm`](https://stnolting.github.io/neorv32/#_zihpm_hardware_performance_monitors)]
188190
[[`Zifencei`](https://stnolting.github.io/neorv32/#_zifencei_instruction_stream_synchronization)]
189191
[[`Zmmul`](https://stnolting.github.io/neorv32/#_zmmul_integer_multiplication)]
192+
[[`Zxcfu`](https://stnolting.github.io/neorv32/#_zxcfu_custom_instructions_extension_cfu)]
190193
[[`PMP`](https://stnolting.github.io/neorv32/#_pmp_physical_memory_protection)]
191194
[[`DEBUG`](https://stnolting.github.io/neorv32/#_cpu_debug_mode)]**
192195

docs/datasheet/cpu.adoc

Lines changed: 33 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
:sectnums:
22
== NEORV32 Central Processing Unit (CPU)
33

4-
image::riscv_logo.png[width=350,align=center]
4+
image::neorv32_cpu_block.png[width=600,align=center]
55

66
**Key Features**
77

@@ -20,6 +20,7 @@ image::riscv_logo.png[width=350,align=center]
2020
** `Zihpm` - hardware performance monitors
2121
** `Zifencei` - instruction stream synchronization
2222
** `Zmmul` - integer multiplication hardware
23+
** `Zxcfu` - custom instructions extension
2324
** `PMP` - physical memory protection
2425
** `Debug` - debug mode
2526
* Compatible to the RISC-V user specifications and a subset of the RISC-V privileged architecture specifications - passes the official RISC-V Architecture Tests (v2+)
@@ -684,6 +685,30 @@ high for one cycle to inform the memory system (like the i-cache to perform a fl
684685
Any additional flags within the `fence.i` instruction word are ignore by the hardware.
685686

686687

688+
==== **`Zxcfu`** Custom Instructions Extension (CFU)
689+
690+
The `Zxcfu` presents a NEORV32-specific _custom RISC-V_ ISA extension (`Z` = sub-extension, `x` = platform-specific
691+
custom extension, `cfu` = name of the custom extension). When enabled via the `CPU_EXTENSION_RISCV_Zxcfu` configuration
692+
generic, this ISA extensions adds the <<_custom_functions_unit_cfu>> to the CPU core. The CFU is a module that is
693+
allows to add **custom RISC-V instructions** to the processor core.
694+
695+
The CPU is implemented as ALU co-processor and is integrated right into the CPU's pipeline providing minimal data
696+
transfer latency as it has direct access to the core's register file. Up to 1024 custom instructions can be
697+
implemented within the CFU. These instructions are mapped to an OPCODE space that has been explicitly reserved by
698+
the RISC-V spec for custom extensions.
699+
700+
Software can utilize the custom instructions by using _intrinsic functions_, which are inline assembly functions that
701+
behave like "regular" C functions.
702+
703+
[TIP]
704+
For more information regarding the CFU see section <<_custom_functions_unit_cfu>>.
705+
706+
[TIP]
707+
The CFU / `Zxcfu` ISA extension is intended for application-specific _instructions_.
708+
If you like to add more complex accelerators or interfaces that can also operate independently of
709+
the CPU take a look at the memory-mapped <<_custom_functions_subsystem_cfs>>.
710+
711+
687712
==== **`PMP`** Physical Memory Protection
688713

689714
The NEORV32 physical memory protection (PMP) is compatible to the RISC-V PMP specifications. It can be used
@@ -796,6 +821,7 @@ configurations are presented in <<_cpu_performance>>.
796821
| Bit-manipulation - single-bit | `B(Zbs)` | `sbset[i]` `sbclr[i]` `sbinv[i]` `sbext[i]` | 3
797822
| Bit-manipulation - shifted-add | `B(Zba)` | `sh1add` `sh2add` `sh3add` | 3
798823
| Bit-manipulation - carry-less multiply | `B(Zbc)` | `clmul` `clmulh` `clmulr` | 3 + 32
824+
| CFU: custom instructions | `Zxcfu` | - | min. 4
799825
|=======================
800826

801827
[NOTE]
@@ -1146,3 +1172,9 @@ be enabled ba enabling a constant in the main VHDL package file (`rtl/core/neorv
11461172
-- "critical" number of PMP regions --
11471173
constant dedicated_reset_c : boolean := false; -- use dedicated hardware reset value for UNCRITICAL registers (FALSE=reset value is irrelevant (might simplify HW), default; TRUE=defined LOW reset value)
11481174
----
1175+
1176+
1177+
<<<
1178+
// ####################################################################################################################
1179+
1180+
include::cpu_cfu.adoc[]

docs/datasheet/cpu_cfu.adoc

Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
<<<
2+
:sectnums:
3+
=== Custom Functions Unit (CFU)
4+
5+
The Custom Functions Unit is the central part of the <<_zxcfu_custom_instructions_extension_cfu>> and represents
6+
the actual hardware module, which is used to implement _custom RISC-V instructions_. The concept of the NEORV32
7+
CFU has been highly inspired by https://github.com/google/CFU-Playground[google's CFU-Playground].
8+
9+
The CFU is intended for operations that are inefficient in terms of performance, latency, energy consumption or
10+
program memory requirements when implemented in pure software. Some potential application fields and exemplary
11+
use-cases might include:
12+
13+
* **AI:** sub-word / vector / SIMD operations like adding all four bytes of a 32-bit data word
14+
* **Cryptographic:** bit substitution and permutation
15+
* **Communication:** conversions like binary to gray-code
16+
* **Image processing:** look-up-tables for color space transformations
17+
* implementing instructions from other RISC-V ISA extensions that are not yet supported by the NEORV32
18+
19+
[NOTE]
20+
The CFU is not intended for complex and autonomous functional units that implement complete accelerators
21+
like block-based AES de-/encoding). Such accelerator can be implemented within the <<_custom_functions_subsystem_cfs>>.
22+
A comparison of all chip-internal hardware extension options is provided in the user guide section
23+
https://stnolting.github.io/neorv32/ug/#_adding_custom_hardware_modules[Adding Custom Hardware Modules].
24+
25+
26+
:sectnums:
27+
==== Custom CFU Instructions - General
28+
29+
The custom instruction utilize a specific instruction space that has been explicitly reserved for user-defined
30+
extensions by the RISC-V specifications ("_Guaranteed Non-Standard Encoding Space_"). The NEORV32 CFU uses the
31+
_CUSTOM0_ opcode to identify custom instructions. The binary encoding of this opcode is `0001011`.
32+
33+
The custom instructions processed by the CFU use the 32-bit **R2-type** RISC-V instruction format, which consists
34+
of six bit-fields:
35+
36+
* `funct7`: 7-bit immediate
37+
* `rs2`: address of second source register
38+
* `rs1`: address of first source register
39+
* `funct3`: 3-bit immediate
40+
* `rd`: address of destination register
41+
* `opcode`: always `0001011` to identify custom instructions
42+
43+
.CFU instruction format (RISC-V R2-type)
44+
image::cfu_r2type_instruction.png[align=center]
45+
46+
[NOTE]
47+
Obviously, all bit-fields including the immediates have to be static at compile time.
48+
49+
.Custom Instructions - Exceptions
50+
[NOTE]
51+
The CPU control logic can only check the _CUSTOM0_ opcode of the custom instructions to check if the
52+
instruction word is valid. It cannot check the `funct3` and `funct7` bit-fields since they are
53+
implementation-defined. Hence, a custom CFU instruction can never raise an illegal instruction exception.
54+
However, custom will raise an illegal instruction exception if the CFU is not enabled/implemented
55+
(i.e. `Zxcfu` ISA extension is not enabled).
56+
57+
The CFU operates on the two source operands and return the processing result to the destination register.
58+
The actual instruction to be performed can be defined by using the `funct7` and `funct3` bit fields.
59+
These immediate bit-fields can also be used to pass additional data to the CFU like offsets, look-up-tables
60+
addresses or shift-amounts. However, the actual functionality is completely user-defined.
61+
62+
63+
:sectnums:
64+
==== Using Custom Instructions in Software
65+
66+
The custom instructions provided by the CFU are included into plain C code by using **intrinsics**. Intrinsics
67+
behave like "normal" functions but under the hood they are a set of macros that hide the complexity of inline assembly.
68+
Using such intrinsics removes the need to modify the compiler, built-in libraries and the assembler when including custom
69+
instructions.
70+
71+
The NEORV32 software framework provides 8 pre-defined custom instructions macros, which are defined in
72+
`sw/lib/include/neorv32_cpu_cfu.h`. Each intrinsic provides an implicit definition of the instruction word's
73+
`funct3` bit-field:
74+
75+
.CFU instruction prototypes
76+
[source,c]
77+
----
78+
neorv32_cfu_cmd0(funct7, rs1, rs2) // funct3 = 000
79+
neorv32_cfu_cmd1(funct7, rs1, rs2) // funct3 = 001
80+
neorv32_cfu_cmd2(funct7, rs1, rs2) // funct3 = 010
81+
neorv32_cfu_cmd3(funct7, rs1, rs2) // funct3 = 011
82+
neorv32_cfu_cmd4(funct7, rs1, rs2) // funct3 = 100
83+
neorv32_cfu_cmd5(funct7, rs1, rs2) // funct3 = 101
84+
neorv32_cfu_cmd6(funct7, rs1, rs2) // funct3 = 110
85+
neorv32_cfu_cmd7(funct7, rs1, rs2) // funct3 = 111
86+
----
87+
88+
Each intrinsic functions always returns a 32-bit value (the processing result). Furthermore,
89+
each intrinsic function requires three arguments:
90+
91+
* `funct7` - 7-bit immediate
92+
* `rs2` - source operand 2, 32-bit
93+
* `rs1` - source operand 1, 32-bit
94+
95+
The `funct7` bit-field is used to pass a 7-bit literal to the CFU. The `rs1` and `rs2` arguments to pass the
96+
actual data to the CFU. These arguments can be populated with variables or literals. The following example
97+
show how to pass arguments when executing `neorv32_cfu_cmd6`: `funct7` is set to all-zero, `rs1` is given
98+
the literal _2751_ and `rs2` is given a variable that contains the return value from `some_function()`.
99+
100+
.CFU instruction usage example
101+
[source,c]
102+
----
103+
uint32_t opb = some_function();
104+
uint32_t res = neorv32_cfu_cmd6(0b0000000, 2751, opb);
105+
----
106+
107+
.CFU Example Program
108+
[TIP]
109+
There is a simple example program for the CFU, which shows how to use the _default_ CFU hardware module.
110+
The example program is located in `sw/example/demo_cfu`.
111+
112+
113+
:sectnums:
114+
==== Custom Instructions Hardware
115+
116+
The actual functionality of the CFU's custom instruction is defined by the logic in the CFU itself.
117+
It is the responsibility of the designer to implement this logic within the CFU hardware module
118+
`rtl/core/neorv32_cpu_cp_cfu.vhd`.
119+
120+
The CFU hardware module receives the data from instruction word's immediate bit-fields and also
121+
the operation data, which is fetched from the CPU's register file.
122+
123+
.CFU instruction data passing example
124+
[source,c]
125+
----
126+
uint32_t opb = 0x12345678;
127+
uint32_t res = neorv32_cfu_cmd6(0b0100111, 0x00cafe00, opb);
128+
----
129+
130+
In this example the CFU hardware module receives the two source operands as 32-bit signal
131+
and the immediate values as 7-bit and 3-bit signals:
132+
133+
* `rs1_i` (32-bit) contains the data from the `rs1` register (here = `0x00cafe00`)
134+
* `rs2_i` (32-bit) contains the data from the `rs2` register (here = 0x12345678)
135+
* `control.funct3` (3-bit) contains the immediate value from the `funct3` bit-field (here = `0b110`; "cmd6")
136+
* `control.funct7` (7-bit) contains the immediate value from the `funct7` bit-field (here = `0b0100111`)
137+
138+
The CFU executes the according instruction (for example this is selected by the `control.funct3` signal)
139+
and provides the operation result in the 32-bit `control.result` signal. The processing can be entirely
140+
combinatorial, so the result is available at the end of the current clock cycle. Processing can also
141+
take several clock cycles and may also include internal states and memories. As soon as the CFU has
142+
completed operations it sets the `control.done` signal high.
143+
144+
.CFU Hardware Example & More Details
145+
[TIP]
146+
The default CFU module already implement some exemplary instructions that are used for illustration
147+
by the CFU example program. See the CFU's VHDL source file (`rtl/core/neorv32_cpu_cp_cfu.vhd`), which
148+
is highly commented to explain the available signals and the handshake with the CPU pipeline.
149+
150+
.CFU Execution Time
151+
[NOTE]
152+
The CFU is not required to finish processing within a bound time.
153+
However, the designer should keep in mind that the CPU is **stalled** until the CFU has finished processing.
154+
This also means the CPU cannot react to pending interrupts. Nevertheless, interrupt requests will still be queued.

docs/datasheet/overview.adoc

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@ include::rationale.adoc[]
5454
* **NEORV32 CPU**: 32-bit `rv32i` RISC-V CPU
5555
** RISC-V compatibility: passes the official architecture tests
5656
** base architecture + privileged architecture (optional) + ISA extensions (optional)
57+
** option to add custom RISC-V instructions (as custom ISA extension)
5758
** rich set of customization options (ISA extensions, design goal: performance / area (/ energy), ...)
5859
** aims to support <<_full_virtualization>> capabilities (CPU _and_ SoC) to increase execution safety
5960
** official https://github.com/riscv/riscv-isa-manual/blob/master/marchid.md[RISC-V open source architecture ID]
@@ -78,6 +79,21 @@ include::rationale.adoc[]
7879
For more in-depth details regarding the feature provided by he hardware see the according sections:
7980
<<_neorv32_central_processing_unit_cpu>> and <<_neorv32_processor_soc>>.
8081

82+
**Extensibility and Customization**
83+
84+
The NEORV32 processor was designed to ease customization and extensibility and provides several options for adding
85+
application-specific custom hardware modules and accelerators. The three most common options for adding custom
86+
on-chip modules are listed below.
87+
88+
* <<_processor_external_memory_interface_wishbone_axi4_lite>> for processor-external modules
89+
* <<_custom_functions_subsystem_cfs>> for tightly-coupled processor-internal co-processors
90+
* <<_custom_functions_unit_cfu>> for custom RISC-V instructions
91+
92+
[TIP]
93+
A more detailed comparison of the extension/customization options can be found in section
94+
https://stnolting.github.io/neorv32/ug/#_adding_custom_hardware_modules[Adding Custom Hardware Modules]
95+
of the user guide.
96+
8197
8298
<<<
8399
// ####################################################################################################################
@@ -143,6 +159,7 @@ neorv32_top.vhd - NEORV32 Processor top entity
143159
├neorv32_cpu.vhd - NEORV32 CPU top entity
144160
│├neorv32_cpu_alu.vhd - Arithmetic/logic unit
145161
││├neorv32_cpu_cp_bitmanip.vhd - Bit-manipulation co-processor (B ext.)
162+
││├neorv32_cpu_cp_cfu.vhd - Custom functions (instruction) co-processor (Zxcfu ext.)
146163
││├neorv32_cpu_cp_fpu.vhd - Floating-point co-processor (Zfinx ext.)
147164
││├neorv32_cpu_cp_muldiv.vhd - Mul/Div co-processor (M extension)
148165
││└neorv32_cpu_cp_shifter.vhd - Bit-shift co-processor

docs/datasheet/rationale.adoc

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,8 @@ co-processors and even user-defined instructions.
3131

3232
**Why RISC-V?**
3333

34+
image::riscv_logo.png[width=250,align=left]
35+
3436
[quote, RISC-V International, https://riscv.org/about/]
3537
____
3638
RISC-V is a free and open ISA enabling a new era of processor innovation through open standard collaboration.
@@ -60,7 +62,7 @@ https://github.com/olofk/serv[SERV] in terms of size. It was build having a diff
6062

6163
The project aims to provide _another option_ in the RISC-V / soft-core design space with a different performance
6264
vs. size trade-off and a different focus: _embrace_ concepts like documentation, platform-independence / portability,
63-
RISC-V compatibility, _customization_ and _ease of use_ (see the <<_project_key_features>> below).
65+
RISC-V compatibility, _ extensibility & customization_ and _ease of use_ (see the <<_project_key_features>> below).
6466

6567
Furthermore, the NEORV32 pays special focus on _execution safety_ using <<_full_virtualization>>. The CPU aims to
6668
provide fall-backs for _everything that could go wrong_. This includes malformed instruction words, privilege escalations

docs/datasheet/soc.adoc

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -399,6 +399,18 @@ cannot be used together with the `M` extension. See section <<_zmmul_integer_mul
399399
|======
400400

401401

402+
:sectnums!:
403+
===== _CPU_EXTENSION_RISCV_Zxcfu_
404+
405+
[cols="4,4,2"]
406+
[frame="all",grid="none"]
407+
|======
408+
| **CPU_EXTENSION_RISCV_Zxcfu** | _boolean_ | false
409+
3+| NEORV32-specific "custom RISC-V" ISA extensions: Implement the <<_custom_functions_unit_cfu>> for user-defined
410+
custom instruction when _true_. See section <<_zxcfu_custom_instructions_extension_cfu>> for more information.
411+
|======
412+
413+
402414
// ####################################################################################################################
403415
:sectnums:
404416
==== Extension Options

docs/datasheet/soc_cfs.adoc

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,11 @@ dedicated hardware accelerators for en-/decryption (AES), signal processing (FFT
3535
(CNNs) as well as custom IO systems like fast memory interfaces (DDR) and mass storage (SDIO), networking (CAN)
3636
or real-time data transport (I2S).
3737

38-
[INFO]
38+
[TIP]
39+
If you like to implement _custom instructions_ that are executed right within the CPU's ALU
40+
see the <<_zxcfu_custom_instructions_extension_cfu>> and the according <<_custom_functions_unit_cfu>>.
41+
42+
[TIP]
3943
Take a look at the template CFS VHDL source file (`rtl/core/neorv32_cfs.vhd`). The file is highly
4044
commented to illustrate all aspects that are relevant for implementing custom CFS-based co-processor designs.
4145

docs/datasheet/soc_sysinfo.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ will signal a "DEVICE ERROR" in this case.
5252
| `0` | _SYSINFO_CPU_ZICSR_ | `Zicsr` extension (`I` sub-extension) available when set (via top's <<_cpu_extension_riscv_zicsr>> generic)
5353
| `1` | _SYSINFO_CPU_ZIFENCEI_ | `Zifencei` extension (`I` sub-extension) available when set (via top's <<_cpu_extension_riscv_zifencei>> generic)
5454
| `2` | _SYSINFO_CPU_ZMMUL_ | `Zmmul` extension (`M` sub-extension) available when set (via top's <<_cpu_extension_riscv_zmmul>> generic)
55+
| `3` | _SYSINFO_CPU_ZXCFU_ | `Zxcfu` extension (custom functions unit for custom instructions) available when set (via top's <<_cpu_extension_riscv_zxcfu>> generic)
5556
| `5` | _SYSINFO_CPU_ZFINX_ | `Zfinx` extension (`F` sub-/alternative-extension) available when set (via top's <<_cpu_extension_riscv_zfinx>> generic)
5657
| `6` | _SYSINFO_CPU_ZXSCNT_ | Custom extension - _Small_ CPU counters: `[m]cycle` & `[m]instret` CSRs have less than 64-bit when set (via top's <<_cpu_cnt_width>> generic)
5758
| `7` | _SYSINFO_CPU_ZXNOCNT_ | Custom extension - _NO_ CPU counters: `[m]cycle` & `[m]instret` CSRs are NOT available at all when set (via top's <<_cpu_cnt_width>> generic)

0 commit comments

Comments
 (0)