You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -9,7 +9,7 @@ The rare few that support minimal mixed precision, run only on one platform, and
9
9
SimSIMD provides an alternative.
10
10
1️⃣ SimSIMD functions are practically as fast as `memcpy`.
11
11
2️⃣ Unlike BLAS, most kernels are designed for mixed-precision and bit-level operations.
12
-
3️⃣ SimSIMD [compiles to more platforms than NumPy (105 vs 35)][compatibility] and has more backends than most BLAS implementations, and more high-level interfaces than most libraries.
12
+
3️⃣ SimSIMD often [ships more binaries than NumPy][compatibility] and has more backends than most BLAS implementations, and more high-level interfaces than most libraries.
@@ -42,7 +42,7 @@ SimSIMD provides an alternative.
42
42
43
43
## Features
44
44
45
-
__SimSIMD__ (Arabic: "سيمسيم دي") is a mixed-precision math library of __over 200 SIMD-optimized kernels__ extensively used in AI, Search, and DBMS workloads.
45
+
__SimSIMD__ (Arabic: "سيمسيم دي") is a mixed-precision math library of __over 350 SIMD-optimized kernels__ extensively used in AI, Search, and DBMS workloads.
46
46
Named after the iconic ["Open Sesame"](https://en.wikipedia.org/wiki/Open_sesame) command that opened doors to treasure in _Ali Baba and the Forty Thieves_, SimSimd can help you 10x the cost-efficiency of your computational pipelines.
47
47
Implemented distance functions include:
48
48
@@ -92,17 +92,136 @@ You can learn more about the technical implementation details in the following b
92
92
93
93
## Benchmarks
94
94
95
-
For reference, we use 1536-dimensional vectors, like the embeddings produced by the OpenAI Ada API.
96
-
Comparing the serial code throughput produced by GCC 12 to hand-optimized kernels in SimSIMD, we see the following single-core improvements for the two most common vector-vector similarity metrics - the Cosine similarity and the Euclidean distance:
> The code was compiled with GCC 12, using glibc v2.35.
222
+
> The benchmarks performed on Arm-based Graviton3 AWS `c7g` instances and `r7iz` Intel Sapphire Rapids.
223
+
> Most modern Arm-based 64-bit CPUs will have similar relative speedups.
224
+
> Variance withing x86 CPUs will be larger.
106
225
107
226
Similar speedups are often observed even when compared to BLAS and LAPACK libraries underlying most numerical computing libraries, including NumPy and SciPy in Python.
108
227
Broader benchmarking results:
@@ -689,23 +808,29 @@ To override compilation settings and switch between runtime and compile-time dis
689
808
#include <simsimd/simsimd.h>
690
809
691
810
int main() {
811
+
simsimd_i8_t i8[1536];
812
+
simsimd_i8_t u8[1536];
692
813
simsimd_f64_t f64s[1536];
693
814
simsimd_f32_t f32s[1536];
694
815
simsimd_f16_t f16s[1536];
695
-
simsimd_i8_t i8[1536];
816
+
simsimd_bf16_t bf16s[1536];
696
817
simsimd_distance_t distance;
697
818
698
819
// Cosine distance between two vectors
699
820
simsimd_cos_i8(i8s, i8s, 1536, &distance);
821
+
simsimd_cos_u8(u8s, u8s, 1536, &distance);
700
822
simsimd_cos_f16(f16s, f16s, 1536, &distance);
701
823
simsimd_cos_f32(f32s, f32s, 1536, &distance);
702
824
simsimd_cos_f64(f64s, f64s, 1536, &distance);
825
+
simsimd_cos_bf16(bf16s, bf16s, 1536, &distance);
703
826
704
827
// Euclidean distance between two vectors
705
828
simsimd_l2sq_i8(i8s, i8s, 1536, &distance);
829
+
simsimd_l2sq_u8(u8s, u8s, 1536, &distance);
706
830
simsimd_l2sq_f16(f16s, f16s, 1536, &distance);
707
831
simsimd_l2sq_f32(f32s, f32s, 1536, &distance);
708
832
simsimd_l2sq_f64(f64s, f64s, 1536, &distance);
833
+
simsimd_l2sq_bf16(bf16s, bf16s, 1536, &distance);
709
834
710
835
return 0;
711
836
}
@@ -717,25 +842,41 @@ int main() {
717
842
#include<simsimd/simsimd.h>
718
843
719
844
intmain() {
720
-
simsimd_f64_t f64s[1536];
721
-
simsimd_f32_t f32s[1536];
845
+
// SimSIMD provides "sized" type-aliases without relying on `stdint.h`
846
+
simsimd_i8_t i8[1536];
847
+
simsimd_i8_t u8[1536];
722
848
simsimd_f16_t f16s[1536];
723
-
simsimd_distance_t distance;
724
-
725
-
// Inner product between two vectors
726
-
simsimd_dot_f16(f16s, f16s, 1536, &distance);
727
-
simsimd_dot_f32(f32s, f32s, 1536, &distance);
728
-
simsimd_dot_f64(f64s, f64s, 1536, &distance);
849
+
simsimd_f32_t f32s[1536];
850
+
simsimd_f64_t f64s[1536];
851
+
simsimd_bf16_t bf16s[1536];
852
+
simsimd_distance_t product;
853
+
854
+
// Inner product between two real vectors
855
+
simsimd_dot_i8(i8s, i8s, 1536, &product);
856
+
simsimd_dot_u8(u8s, u8s, 1536, &product);
857
+
simsimd_dot_f16(f16s, f16s, 1536, &product);
858
+
simsimd_dot_f32(f32s, f32s, 1536, &product);
859
+
simsimd_dot_f64(f64s, f64s, 1536, &product);
860
+
simsimd_dot_bf16(bf16s, bf16s, 1536, &product);
861
+
862
+
// SimSIMD provides complex types with `real` and `imag` fields
863
+
simsimd_f64c_t f64s[768];
864
+
simsimd_f32c_t f32s[768];
865
+
simsimd_f16c_t f16s[768];
866
+
simsimd_bf16c_t bf16s[768];
867
+
simsimd_distance_t products[2]; // real and imaginary parts
0 commit comments