Consider removing x86_64 field assembly

I noticed that turning off x86_64 assembly on my laptop actually speeds up `ecdsa_verify`. The internal benchmarks show that `--without-asm` scalar operations are slower, but field operations are faster. In order to investigate this I created a branch that includes the configurable benchmark iterations from #722, a test_matrix.sh script and allows turning on field assembly individually (https://github.com/jonasnick/secp256k1/tree/eval-asm).

Here are the results with gcc 9.3.0 (got similar results with clang 9.0.1):
```
SECP256K1_BENCH_ITERS=200000

bench config CFLAGS=-DUSE_ASM_X86_64_FIELD ./configure --disable-openssl-tests --with-asm=x86_64
scalar_sqr: min 0.0331us / avg 0.0332us / max 0.0337us
scalar_mul: min 0.0342us / avg 0.0343us / max 0.0345us
field_sqr: min 0.0165us / avg 0.0165us / max 0.0167us
field_mul: min 0.0204us / avg 0.0205us / max 0.0209us
ecdsa_sign: min 40.3us / avg 40.3us / max 40.4us
ecdsa_verify: min 56.9us / avg 56.9us / max 56.9us

bench config CFLAGS= ./configure --disable-openssl-tests --without-asm
scalar_sqr: min 0.0375us / avg 0.0376us / max 0.0383us
scalar_mul: min 0.0362us / avg 0.0366us / max 0.0396us
field_sqr: min 0.0152us / avg 0.0152us / max 0.0152us
field_mul: min 0.0177us / avg 0.0178us / max 0.0178us
ecdsa_sign: min 41.8us / avg 41.8us / max 41.9us
ecdsa_verify: min 54.6us / avg 54.7us / max 54.7us

bench config CFLAGS= ./configure --disable-openssl-tests --with-asm=x86_64
scalar_sqr: min 0.0331us / avg 0.0331us / max 0.0333us
scalar_mul: min 0.0342us / avg 0.0343us / max 0.0347us
field_sqr: min 0.0152us / avg 0.0153us / max 0.0154us
field_mul: min 0.0178us / avg 0.0178us / max 0.0180us
ecdsa_sign: min 40.3us / avg 40.3us / max 40.4us
ecdsa_verify: min 53.2us / avg 53.2us / max 53.2us
```

Note the 6.5% ecdsa_verify speedup. However, I don't fully understand this:

1. There's assembly for field_sqr and field_mul. If we remove it, both functions are faster. But, some other internal functions are slower. For example:
    ```
    SECP256K1_BENCH_ITERS=200000
    group_add_affine: min 0.257us / avg 0.257us / max 0.259us
    vs.
    group_add_affine: min 0.263us / avg 0.263us / max 0.264us
    ```
    This could just be an artifact of micro-benching and I have not tested this with #667.
2. Removing field arithmetic also makes ecdsa verification slower if endomorphism is enabled.
    ```
    SECP256K1_BENCH_ITERS=200000
    ecdsa_verify: min 41.1us / avg 41.1us / max 41.1us
    vs.
    ecdsa_verify: min 41.5us / avg 41.6us / max 41.6us
    ```

It should be noted that without field arithmetic assembly, in order to use 64 bit field arithmetic you need to have `__int128` support (or use field=32bit with a 40% verification slowdown). I did not check where this is supported (MSVC?). Also we should try this with older compilers.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consider removing x86_64 field assembly #726

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Consider removing x86_64 field assembly #726

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions