Skip to content

Commit 3af7f08

Browse files
authored
Merge pull request #243 from ashvardanian/main-dev
Navigation Docs
2 parents e0bc213 + e2b1068 commit 3af7f08

File tree

3 files changed

+40
-1
lines changed

3 files changed

+40
-1
lines changed

CONTRIBUTING.md

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,26 @@ To keep the quality of the code high, we have a set of [guidelines](https://gith
66
- [How to organize branches?](https://github.com/unum-cloud/awesome/blob/main/Workflow.md#branches)
77
- [How to style commits?](https://github.com/unum-cloud/awesome/blob/main/Workflow.md#commits)
88

9+
## Navigating the Codebase
10+
11+
Primary kernels are implemented in header files under `include/simsimd/`:
12+
13+
- `dot.h` - dot products for real and complex vectors.
14+
- `spatial.h` - spatial distances: L2, cosine distance.
15+
- `binary.h` - binary distances: Hamming, Jaccard, etc.
16+
- `probability.h` - probability metrics: KL-divergence, Jensen-Shannon, etc.
17+
- `sparse.h` - sparse distances: weighted and normal set intersections.
18+
- `curved.h` - bilinear forms for real and complex vectors, and Mahalanobis distance.
19+
20+
Bindings to other languages are in the respective directories:
21+
22+
- `python/lib.c` - Python bindings.
23+
- `javascript/lib.c` - JavaScript bindings.
24+
- `rust/lib.rs` - Rust bindings.
25+
- `swift/SimSIMD.swift` - Swift bindings.
26+
27+
All tests, benchmarks, and examples are placed in the `scripts/` directory, if compatible with the toolchain of the implementation language.
28+
929
## C and C++
1030

1131
To rerun experiments utilize the following command:
@@ -277,4 +297,3 @@ cd golang
277297
go test # To test
278298
go test -run=^$ -bench=. -benchmem # To benchmark
279299
```
280-

include/simsimd/dot.h

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,16 @@ SIMSIMD_PUBLIC void simsimd_dot_u8_haswell(simsimd_u8_t const* a, simsimd_u8_t c
130130
* Ice Lake added VNNI, VPOPCNTDQ, IFMA, VBMI, VAES, GFNI, VBMI2, BITALG, VPCLMULQDQ, and other extensions for integral operations.
131131
* Genoa added only BF16.
132132
* Sapphire Rapids added tiled matrix operations, but we are most interested in the new mixed-precision FMA instructions.
133+
*
134+
* Sadly, we can't effectively interleave different kinds of arithmetic instructions to utilize more ports:
135+
*
136+
* > Like Intel server architectures since Skylake-X, SPR cores feature two 512-bit FMA units, and organize them in a similar fashion.
137+
* > One 512-bit FMA unit is created by fusing two 256-bit ones on port 0 and port 1. The other is added to port 5, as a server-specific
138+
* > core extension. The FMA units on port 0 and 1 are configured into 2×256-bit or 1×512-bit mode depending on whether 512-bit FMA
139+
* > instructions are present in the scheduler. That means a mix of 256-bit and 512-bit FMA instructions will not achieve higher IPC
140+
* > than executing 512-bit instructions alone.
141+
*
142+
* Source: https://chipsandcheese.com/p/a-peek-at-sapphire-rapids
133143
*/
134144
SIMSIMD_PUBLIC void simsimd_dot_f64_skylake(simsimd_f64_t const* a, simsimd_f64_t const* b, simsimd_size_t n, simsimd_distance_t* result);
135145
SIMSIMD_PUBLIC void simsimd_dot_f64c_skylake(simsimd_f64c_t const* a, simsimd_f64c_t const* b, simsimd_size_t n, simsimd_distance_t* results);

include/simsimd/spatial.h

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,16 @@ SIMSIMD_PUBLIC void simsimd_cos_f64_haswell(simsimd_f64_t const* a, simsimd_f64_
139139
/* SIMD-powered backends for AVX512 CPUs of Skylake generation and newer, using 32-bit arithmetic over 512-bit words.
140140
* Skylake was launched in 2015, and discontinued in 2019. Skylake had support for F, CD, VL, DQ, and BW extensions,
141141
* as well as masked operations. This is enough to supersede auto-vectorization on `f32` and `f64` types.
142+
*
143+
* Sadly, we can't effectively interleave different kinds of arithmetic instructions to utilize more ports:
144+
*
145+
* > Like Intel server architectures since Skylake-X, SPR cores feature two 512-bit FMA units, and organize them in a similar fashion.
146+
* > One 512-bit FMA unit is created by fusing two 256-bit ones on port 0 and port 1. The other is added to port 5, as a server-specific
147+
* > core extension. The FMA units on port 0 and 1 are configured into 2×256-bit or 1×512-bit mode depending on whether 512-bit FMA
148+
* > instructions are present in the scheduler. That means a mix of 256-bit and 512-bit FMA instructions will not achieve higher IPC
149+
* > than executing 512-bit instructions alone.
150+
*
151+
* Source: https://chipsandcheese.com/p/a-peek-at-sapphire-rapids
142152
*/
143153
SIMSIMD_PUBLIC void simsimd_l2_f32_skylake(simsimd_f32_t const* a, simsimd_f32_t const* b, simsimd_size_t n, simsimd_distance_t* d);
144154
SIMSIMD_PUBLIC void simsimd_l2sq_f32_skylake(simsimd_f32_t const* a, simsimd_f32_t const* b, simsimd_size_t n, simsimd_distance_t* d);

0 commit comments

Comments
 (0)