Skip to content

✨[Zxcfu ISA ext.] add option to implement custom RISC-V instructions #264

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 34 commits into from
Jan 31, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
9bcb115
[docs/figures] added 'Zxcfu' extension
stnolting Jan 28, 2022
3aba96f
[rtl/system_integration] added Zxcfu ISA extensions
stnolting Jan 28, 2022
28ba540
[rtl/core] TOP: added generic to enabale Zxcfu ISA extension
stnolting Jan 28, 2022
bd73053
[sim] added Zxcfu ISA ext.
stnolting Jan 28, 2022
e7c5ce3
[sw/lib] added "custom OPCODES" to intrinsic library
stnolting Jan 28, 2022
ccd158e
[sw/lib] CFU: comment typo fix
stnolting Jan 28, 2022
cd50dbf
[rtl/core] CPU control: added custom OPCODE logic
stnolting Jan 28, 2022
41ea603
[rtl/core] SYSINFO: added Zxcfu flag
stnolting Jan 28, 2022
5f5f4d6
[rtl/core] ALU: added CFU to CP4 slot
stnolting Jan 28, 2022
e383953
[rtl/core] CPU: added Zxcfu ISA extension
stnolting Jan 28, 2022
27be939
[rtl/core] PACKAGE: added Zxcfu ext. and CFU module
stnolting Jan 28, 2022
75607ae
:sparkles: [rtl/core] added CFU hardware module
stnolting Jan 28, 2022
6c228c4
[sw/lib] add CFU drivers and SYSCONFIG bit definition
stnolting Jan 28, 2022
9ae94d3
[sw/lib] RTE: added Zxcfu ISA extension
stnolting Jan 28, 2022
e205391
[docs/datasheet] SYSINFO: added Zxcfu flag
stnolting Jan 28, 2022
790c9c1
[docs/datasheet] SOC: added Zxcfu generic
stnolting Jan 28, 2022
8124196
[docs/datasheet] OVERVIEW: added CFU HW file
stnolting Jan 28, 2022
b7d515d
[docs/datasheet] CFS: added note/link to CFU
stnolting Jan 28, 2022
2db059b
[docs/datasheet] CPU: added Zxcfu ISA extension
stnolting Jan 28, 2022
62ba20e
[README] add links to Zxcfu extension and CFU module
stnolting Jan 28, 2022
ec025d5
[docs/datasheet] added CFU section (still empty)
stnolting Jan 29, 2022
17219d2
minor edits
stnolting Jan 29, 2022
c79f5c3
:sparkles: [sw/example] added CFU example program
stnolting Jan 29, 2022
ad515d6
[docs/figures] added R2-type CFU instruction diagram
stnolting Jan 29, 2022
dbf8a1f
[sw/lib/include] fixed typo in intrinsics library
stnolting Jan 29, 2022
1d97ffa
[docs/datasheet] minor edits -> "extensibility"
stnolting Jan 29, 2022
b7d5f35
[docs] added CPU block diagram
stnolting Jan 29, 2022
eda0278
minor edits
stnolting Jan 29, 2022
ea4d57c
[docs/datasheet] added CFU to SW file list
stnolting Jan 29, 2022
67449f4
:books: [docs/datasheet] added new section "CFU"
stnolting Jan 29, 2022
0e3a715
[docs/userguide] reworked section "Adding Custom Hardware Modules"
stnolting Jan 29, 2022
1470de8
[docs/userguide] custom hardware extension options: minor edits
stnolting Jan 30, 2022
63ecf74
[CHANGELOG] added v1.6.7.1
stnolting Jan 30, 2022
4382a29
typo fixes in comments
stnolting Jan 31, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ defined by the `hw_version_c` constant in the main VHDL package file [`rtl/core/

| Date (*dd.mm.yyyy*) | Version | Comment |
|:----------:|:-------:|:--------|
| 30.01.2022 | 1.6.7.1 | :sparkles: added **`Zxcfu` ISA extension for user-defined custom RISC-V instructions**; see [PR #264](https://github.com/stnolting/neorv32/pull/264) |
| 28.01.2022 |[**:rocket:1.6.7**](https://github.com/stnolting/neorv32/releases/tag/v1.6.7) | **New release** |
| 28.01.2022 | 1.6.6.10 | :bug: fixed bug in **bit-manipulation co-processor**: decoding collision between `cpop` and `rol` instructions; :bug: fixed bug in co-processor arbitration when an illegal instruction is detected; added four additional (yet unused) **CPU** co-processor slots; [PR #262](https://github.com/stnolting/neorv32/pull/262) |
| 27.01.2022 | 1.6.6.9 | reworked **CFS** "user" logic; added CFS demo program; see [PR #261](https://github.com/stnolting/neorv32/pull/261) |
Expand Down
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,9 @@ the "Minimal RISC-V Debug Specification Version 0.13.2" and compatible with **Op
* _true random_ number generator ([TRNG](https://stnolting.github.io/neorv32/#_true_random_number_generator_trng))
* execute in place module ([XIP](https://stnolting.github.io/neorv32/#_execute_in_place_module_xip)) to directly execute code from SPI flash
* custom functions subsystem ([CFS](https://stnolting.github.io/neorv32/#_custom_functions_subsystem_cfs))
for tightly-coupled custom co-processor extensions and interfaces
for tightly-coupled custom accelerators and interfaces
* custom functions unit ([CFU](https://stnolting.github.io/neorv32/#_custom_functions_unit_cfu)) for up to 1024
_custom RISC-V instructions_

[[back to top](#The-NEORV32-RISC-V-Processor)]

Expand Down Expand Up @@ -187,6 +189,7 @@ documentation section).
[[`Zihpm`](https://stnolting.github.io/neorv32/#_zihpm_hardware_performance_monitors)]
[[`Zifencei`](https://stnolting.github.io/neorv32/#_zifencei_instruction_stream_synchronization)]
[[`Zmmul`](https://stnolting.github.io/neorv32/#_zmmul_integer_multiplication)]
[[`Zxcfu`](https://stnolting.github.io/neorv32/#_zxcfu_custom_instructions_extension_cfu)]
[[`PMP`](https://stnolting.github.io/neorv32/#_pmp_physical_memory_protection)]
[[`DEBUG`](https://stnolting.github.io/neorv32/#_cpu_debug_mode)]**

Expand Down
34 changes: 33 additions & 1 deletion docs/datasheet/cpu.adoc
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
:sectnums:
== NEORV32 Central Processing Unit (CPU)

image::riscv_logo.png[width=350,align=center]
image::neorv32_cpu_block.png[width=600,align=center]

**Key Features**

Expand All @@ -20,6 +20,7 @@ image::riscv_logo.png[width=350,align=center]
** `Zihpm` - hardware performance monitors
** `Zifencei` - instruction stream synchronization
** `Zmmul` - integer multiplication hardware
** `Zxcfu` - custom instructions extension
** `PMP` - physical memory protection
** `Debug` - debug mode
* Compatible to the RISC-V user specifications and a subset of the RISC-V privileged architecture specifications - passes the official RISC-V Architecture Tests (v2+)
Expand Down Expand Up @@ -684,6 +685,30 @@ high for one cycle to inform the memory system (like the i-cache to perform a fl
Any additional flags within the `fence.i` instruction word are ignore by the hardware.


==== **`Zxcfu`** Custom Instructions Extension (CFU)

The `Zxcfu` presents a NEORV32-specific _custom RISC-V_ ISA extension (`Z` = sub-extension, `x` = platform-specific
custom extension, `cfu` = name of the custom extension). When enabled via the `CPU_EXTENSION_RISCV_Zxcfu` configuration
generic, this ISA extensions adds the <<_custom_functions_unit_cfu>> to the CPU core. The CFU is a module that is
allows to add **custom RISC-V instructions** to the processor core.

The CPU is implemented as ALU co-processor and is integrated right into the CPU's pipeline providing minimal data
transfer latency as it has direct access to the core's register file. Up to 1024 custom instructions can be
implemented within the CFU. These instructions are mapped to an OPCODE space that has been explicitly reserved by
the RISC-V spec for custom extensions.

Software can utilize the custom instructions by using _intrinsic functions_, which are inline assembly functions that
behave like "regular" C functions.

[TIP]
For more information regarding the CFU see section <<_custom_functions_unit_cfu>>.

[TIP]
The CFU / `Zxcfu` ISA extension is intended for application-specific _instructions_.
If you like to add more complex accelerators or interfaces that can also operate independently of
the CPU take a look at the memory-mapped <<_custom_functions_subsystem_cfs>>.


==== **`PMP`** Physical Memory Protection

The NEORV32 physical memory protection (PMP) is compatible to the RISC-V PMP specifications. It can be used
Expand Down Expand Up @@ -796,6 +821,7 @@ configurations are presented in <<_cpu_performance>>.
| Bit-manipulation - single-bit | `B(Zbs)` | `sbset[i]` `sbclr[i]` `sbinv[i]` `sbext[i]` | 3
| Bit-manipulation - shifted-add | `B(Zba)` | `sh1add` `sh2add` `sh3add` | 3
| Bit-manipulation - carry-less multiply | `B(Zbc)` | `clmul` `clmulh` `clmulr` | 3 + 32
| CFU: custom instructions | `Zxcfu` | - | min. 4
|=======================

[NOTE]
Expand Down Expand Up @@ -1146,3 +1172,9 @@ be enabled ba enabling a constant in the main VHDL package file (`rtl/core/neorv
-- "critical" number of PMP regions --
constant dedicated_reset_c : boolean := false; -- use dedicated hardware reset value for UNCRITICAL registers (FALSE=reset value is irrelevant (might simplify HW), default; TRUE=defined LOW reset value)
----


<<<
// ####################################################################################################################

include::cpu_cfu.adoc[]
154 changes: 154 additions & 0 deletions docs/datasheet/cpu_cfu.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
<<<
:sectnums:
=== Custom Functions Unit (CFU)

The Custom Functions Unit is the central part of the <<_zxcfu_custom_instructions_extension_cfu>> and represents
the actual hardware module, which is used to implement _custom RISC-V instructions_. The concept of the NEORV32
CFU has been highly inspired by https://github.com/google/CFU-Playground[google's CFU-Playground].

The CFU is intended for operations that are inefficient in terms of performance, latency, energy consumption or
program memory requirements when implemented in pure software. Some potential application fields and exemplary
use-cases might include:

* **AI:** sub-word / vector / SIMD operations like adding all four bytes of a 32-bit data word
* **Cryptographic:** bit substitution and permutation
* **Communication:** conversions like binary to gray-code
* **Image processing:** look-up-tables for color space transformations
* implementing instructions from other RISC-V ISA extensions that are not yet supported by the NEORV32

[NOTE]
The CFU is not intended for complex and autonomous functional units that implement complete accelerators
like block-based AES de-/encoding). Such accelerator can be implemented within the <<_custom_functions_subsystem_cfs>>.
A comparison of all chip-internal hardware extension options is provided in the user guide section
https://stnolting.github.io/neorv32/ug/#_adding_custom_hardware_modules[Adding Custom Hardware Modules].


:sectnums:
==== Custom CFU Instructions - General

The custom instruction utilize a specific instruction space that has been explicitly reserved for user-defined
extensions by the RISC-V specifications ("_Guaranteed Non-Standard Encoding Space_"). The NEORV32 CFU uses the
_CUSTOM0_ opcode to identify custom instructions. The binary encoding of this opcode is `0001011`.

The custom instructions processed by the CFU use the 32-bit **R2-type** RISC-V instruction format, which consists
of six bit-fields:

* `funct7`: 7-bit immediate
* `rs2`: address of second source register
* `rs1`: address of first source register
* `funct3`: 3-bit immediate
* `rd`: address of destination register
* `opcode`: always `0001011` to identify custom instructions

.CFU instruction format (RISC-V R2-type)
image::cfu_r2type_instruction.png[align=center]

[NOTE]
Obviously, all bit-fields including the immediates have to be static at compile time.

.Custom Instructions - Exceptions
[NOTE]
The CPU control logic can only check the _CUSTOM0_ opcode of the custom instructions to check if the
instruction word is valid. It cannot check the `funct3` and `funct7` bit-fields since they are
implementation-defined. Hence, a custom CFU instruction can never raise an illegal instruction exception.
However, custom will raise an illegal instruction exception if the CFU is not enabled/implemented
(i.e. `Zxcfu` ISA extension is not enabled).

The CFU operates on the two source operands and return the processing result to the destination register.
The actual instruction to be performed can be defined by using the `funct7` and `funct3` bit fields.
These immediate bit-fields can also be used to pass additional data to the CFU like offsets, look-up-tables
addresses or shift-amounts. However, the actual functionality is completely user-defined.


:sectnums:
==== Using Custom Instructions in Software

The custom instructions provided by the CFU are included into plain C code by using **intrinsics**. Intrinsics
behave like "normal" functions but under the hood they are a set of macros that hide the complexity of inline assembly.
Using such intrinsics removes the need to modify the compiler, built-in libraries and the assembler when including custom
instructions.

The NEORV32 software framework provides 8 pre-defined custom instructions macros, which are defined in
`sw/lib/include/neorv32_cpu_cfu.h`. Each intrinsic provides an implicit definition of the instruction word's
`funct3` bit-field:

.CFU instruction prototypes
[source,c]
----
neorv32_cfu_cmd0(funct7, rs1, rs2) // funct3 = 000
neorv32_cfu_cmd1(funct7, rs1, rs2) // funct3 = 001
neorv32_cfu_cmd2(funct7, rs1, rs2) // funct3 = 010
neorv32_cfu_cmd3(funct7, rs1, rs2) // funct3 = 011
neorv32_cfu_cmd4(funct7, rs1, rs2) // funct3 = 100
neorv32_cfu_cmd5(funct7, rs1, rs2) // funct3 = 101
neorv32_cfu_cmd6(funct7, rs1, rs2) // funct3 = 110
neorv32_cfu_cmd7(funct7, rs1, rs2) // funct3 = 111
----

Each intrinsic functions always returns a 32-bit value (the processing result). Furthermore,
each intrinsic function requires three arguments:

* `funct7` - 7-bit immediate
* `rs2` - source operand 2, 32-bit
* `rs1` - source operand 1, 32-bit

The `funct7` bit-field is used to pass a 7-bit literal to the CFU. The `rs1` and `rs2` arguments to pass the
actual data to the CFU. These arguments can be populated with variables or literals. The following example
show how to pass arguments when executing `neorv32_cfu_cmd6`: `funct7` is set to all-zero, `rs1` is given
the literal _2751_ and `rs2` is given a variable that contains the return value from `some_function()`.

.CFU instruction usage example
[source,c]
----
uint32_t opb = some_function();
uint32_t res = neorv32_cfu_cmd6(0b0000000, 2751, opb);
----

.CFU Example Program
[TIP]
There is a simple example program for the CFU, which shows how to use the _default_ CFU hardware module.
The example program is located in `sw/example/demo_cfu`.


:sectnums:
==== Custom Instructions Hardware

The actual functionality of the CFU's custom instruction is defined by the logic in the CFU itself.
It is the responsibility of the designer to implement this logic within the CFU hardware module
`rtl/core/neorv32_cpu_cp_cfu.vhd`.

The CFU hardware module receives the data from instruction word's immediate bit-fields and also
the operation data, which is fetched from the CPU's register file.

.CFU instruction data passing example
[source,c]
----
uint32_t opb = 0x12345678;
uint32_t res = neorv32_cfu_cmd6(0b0100111, 0x00cafe00, opb);
----

In this example the CFU hardware module receives the two source operands as 32-bit signal
and the immediate values as 7-bit and 3-bit signals:

* `rs1_i` (32-bit) contains the data from the `rs1` register (here = `0x00cafe00`)
* `rs2_i` (32-bit) contains the data from the `rs2` register (here = 0x12345678)
* `control.funct3` (3-bit) contains the immediate value from the `funct3` bit-field (here = `0b110`; "cmd6")
* `control.funct7` (7-bit) contains the immediate value from the `funct7` bit-field (here = `0b0100111`)

The CFU executes the according instruction (for example this is selected by the `control.funct3` signal)
and provides the operation result in the 32-bit `control.result` signal. The processing can be entirely
combinatorial, so the result is available at the end of the current clock cycle. Processing can also
take several clock cycles and may also include internal states and memories. As soon as the CFU has
completed operations it sets the `control.done` signal high.

.CFU Hardware Example & More Details
[TIP]
The default CFU module already implement some exemplary instructions that are used for illustration
by the CFU example program. See the CFU's VHDL source file (`rtl/core/neorv32_cpu_cp_cfu.vhd`), which
is highly commented to explain the available signals and the handshake with the CPU pipeline.

.CFU Execution Time
[NOTE]
The CFU is not required to finish processing within a bound time.
However, the designer should keep in mind that the CPU is **stalled** until the CFU has finished processing.
This also means the CPU cannot react to pending interrupts. Nevertheless, interrupt requests will still be queued.
17 changes: 17 additions & 0 deletions docs/datasheet/overview.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ include::rationale.adoc[]
* **NEORV32 CPU**: 32-bit `rv32i` RISC-V CPU
** RISC-V compatibility: passes the official architecture tests
** base architecture + privileged architecture (optional) + ISA extensions (optional)
** option to add custom RISC-V instructions (as custom ISA extension)
** rich set of customization options (ISA extensions, design goal: performance / area (/ energy), ...)
** aims to support <<_full_virtualization>> capabilities (CPU _and_ SoC) to increase execution safety
** official https://github.com/riscv/riscv-isa-manual/blob/master/marchid.md[RISC-V open source architecture ID]
Expand All @@ -78,6 +79,21 @@ include::rationale.adoc[]
For more in-depth details regarding the feature provided by he hardware see the according sections:
<<_neorv32_central_processing_unit_cpu>> and <<_neorv32_processor_soc>>.

**Extensibility and Customization**

The NEORV32 processor was designed to ease customization and extensibility and provides several options for adding
application-specific custom hardware modules and accelerators. The three most common options for adding custom
on-chip modules are listed below.

* <<_processor_external_memory_interface_wishbone_axi4_lite>> for processor-external modules
* <<_custom_functions_subsystem_cfs>> for tightly-coupled processor-internal co-processors
* <<_custom_functions_unit_cfu>> for custom RISC-V instructions

[TIP]
A more detailed comparison of the extension/customization options can be found in section
https://stnolting.github.io/neorv32/ug/#_adding_custom_hardware_modules[Adding Custom Hardware Modules]
of the user guide.


<<<
// ####################################################################################################################
Expand Down Expand Up @@ -143,6 +159,7 @@ neorv32_top.vhd - NEORV32 Processor top entity
├neorv32_cpu.vhd - NEORV32 CPU top entity
│├neorv32_cpu_alu.vhd - Arithmetic/logic unit
││├neorv32_cpu_cp_bitmanip.vhd - Bit-manipulation co-processor (B ext.)
││├neorv32_cpu_cp_cfu.vhd - Custom functions (instruction) co-processor (Zxcfu ext.)
││├neorv32_cpu_cp_fpu.vhd - Floating-point co-processor (Zfinx ext.)
││├neorv32_cpu_cp_muldiv.vhd - Mul/Div co-processor (M extension)
││└neorv32_cpu_cp_shifter.vhd - Bit-shift co-processor
Expand Down
4 changes: 3 additions & 1 deletion docs/datasheet/rationale.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@ co-processors and even user-defined instructions.

**Why RISC-V?**

image::riscv_logo.png[width=250,align=left]

[quote, RISC-V International, https://riscv.org/about/]
____
RISC-V is a free and open ISA enabling a new era of processor innovation through open standard collaboration.
Expand Down Expand Up @@ -60,7 +62,7 @@ https://github.com/olofk/serv[SERV] in terms of size. It was build having a diff

The project aims to provide _another option_ in the RISC-V / soft-core design space with a different performance
vs. size trade-off and a different focus: _embrace_ concepts like documentation, platform-independence / portability,
RISC-V compatibility, _customization_ and _ease of use_ (see the <<_project_key_features>> below).
RISC-V compatibility, _ extensibility & customization_ and _ease of use_ (see the <<_project_key_features>> below).

Furthermore, the NEORV32 pays special focus on _execution safety_ using <<_full_virtualization>>. The CPU aims to
provide fall-backs for _everything that could go wrong_. This includes malformed instruction words, privilege escalations
Expand Down
12 changes: 12 additions & 0 deletions docs/datasheet/soc.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -399,6 +399,18 @@ cannot be used together with the `M` extension. See section <<_zmmul_integer_mul
|======


:sectnums!:
===== _CPU_EXTENSION_RISCV_Zxcfu_

[cols="4,4,2"]
[frame="all",grid="none"]
|======
| **CPU_EXTENSION_RISCV_Zxcfu** | _boolean_ | false
3+| NEORV32-specific "custom RISC-V" ISA extensions: Implement the <<_custom_functions_unit_cfu>> for user-defined
custom instruction when _true_. See section <<_zxcfu_custom_instructions_extension_cfu>> for more information.
|======


// ####################################################################################################################
:sectnums:
==== Extension Options
Expand Down
6 changes: 5 additions & 1 deletion docs/datasheet/soc_cfs.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,11 @@ dedicated hardware accelerators for en-/decryption (AES), signal processing (FFT
(CNNs) as well as custom IO systems like fast memory interfaces (DDR) and mass storage (SDIO), networking (CAN)
or real-time data transport (I2S).

[INFO]
[TIP]
If you like to implement _custom instructions_ that are executed right within the CPU's ALU
see the <<_zxcfu_custom_instructions_extension_cfu>> and the according <<_custom_functions_unit_cfu>>.

[TIP]
Take a look at the template CFS VHDL source file (`rtl/core/neorv32_cfs.vhd`). The file is highly
commented to illustrate all aspects that are relevant for implementing custom CFS-based co-processor designs.

Expand Down
1 change: 1 addition & 0 deletions docs/datasheet/soc_sysinfo.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ will signal a "DEVICE ERROR" in this case.
| `0` | _SYSINFO_CPU_ZICSR_ | `Zicsr` extension (`I` sub-extension) available when set (via top's <<_cpu_extension_riscv_zicsr>> generic)
| `1` | _SYSINFO_CPU_ZIFENCEI_ | `Zifencei` extension (`I` sub-extension) available when set (via top's <<_cpu_extension_riscv_zifencei>> generic)
| `2` | _SYSINFO_CPU_ZMMUL_ | `Zmmul` extension (`M` sub-extension) available when set (via top's <<_cpu_extension_riscv_zmmul>> generic)
| `3` | _SYSINFO_CPU_ZXCFU_ | `Zxcfu` extension (custom functions unit for custom instructions) available when set (via top's <<_cpu_extension_riscv_zxcfu>> generic)
| `5` | _SYSINFO_CPU_ZFINX_ | `Zfinx` extension (`F` sub-/alternative-extension) available when set (via top's <<_cpu_extension_riscv_zfinx>> generic)
| `6` | _SYSINFO_CPU_ZXSCNT_ | Custom extension - _Small_ CPU counters: `[m]cycle` & `[m]instret` CSRs have less than 64-bit when set (via top's <<_cpu_cnt_width>> generic)
| `7` | _SYSINFO_CPU_ZXNOCNT_ | Custom extension - _NO_ CPU counters: `[m]cycle` & `[m]instret` CSRs are NOT available at all when set (via top's <<_cpu_cnt_width>> generic)
Expand Down
Loading