Skip to content

Commit 998a124

Browse files
mergify[bot]aaroncaljo242
authored
docs: add transaction malleability docs (backport #23958) (#24050)
Co-authored-by: Aaron Craelius <[email protected]> Co-authored-by: aljo242 <[email protected]>
1 parent 5a71f92 commit 998a124

File tree

2 files changed

+169
-0
lines changed

2 files changed

+169
-0
lines changed

docs/architecture/README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,10 @@ When writing ADRs, follow the same best practices for writing RFCs. When writing
8686
* [ADR 062: Collections State Layer](./adr-062-collections-state-layer.md)
8787
* [ADR 063: Core Module API](./adr-063-core-module-api.md)
8888
* [ADR 065: Store V2](./adr-065-store-v2.md)
89+
* [ADR 067: Simulator v2](./adr-067-simulator-v2.md)
90+
* [ADR 069: `x/gov` modularity, multiple choice and optimisic proposals](./adr-069-gov-improvements.md)
91+
* [ADR 074: Messages with implicit signers](./adr-074-implicit-msg-signers.md)
92+
* [ADR 076: Transaction Malleability Risk Review and Recommendations](./adr-076-tx-malleability.md)
8993

9094
### Draft
9195

Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
# Cosmos SDK Transaction Malleability Risk Review and Recommendations
2+
3+
## Changelog
4+
5+
* 2025-03-10: Initial draft (@aaronc)
6+
7+
## Status
8+
9+
PROPOSED: Not Implemented
10+
11+
## Abstract
12+
13+
Several encoding and sign mode related issues have historically resulted in the possibility
14+
that Cosmos SDK transactions may be re-encoded in such a way as to change their hash
15+
(and in rare cases, their meaning) without invalidating the signature.
16+
This document details these cases, their potential risks, the extent to which they have been
17+
addressed, and provides recommendations for future improvements.
18+
19+
## Review
20+
21+
One naive assumption about Cosmos SDK transactions is that hashing the raw bytes of a submitted transaction creates a safe unique identifier for the transaction. In reality, there are multiple ways in which transactions could be manipulated to create different transaction bytes (and as a result different hashes) that still pass signature verification.
22+
23+
This document attempts to enumerate the various potential transaction "malleability" risks that we have identified and the extent to which they have or have not been addressed in various sign modes. We also identify vulnerabilities that could be introduced if developers make changes in the future without careful consideration of the complexities involved with transaction encoding, sign modes and signatures.
24+
25+
### Risks Associated with Malleability
26+
27+
The malleability of transactions poses the following potential risks to end users:
28+
* unsigned data could get added to transactions and be processed by state machines
29+
* clients often rely on transaction hashes for checking transaction status, but whether or not submitted transaction hashes match processed transaction hashes depends primarily on good network actors rather than fundamental protocol guarantees
30+
* transactions could potentially get executed more than once (faulty replay protection)
31+
32+
If a client generates a transaction, keeps a record of its hash and then attempts to query nodes to check the transaction's status, this process may falsely conclude that the transaction had not been processed if an intermediary
33+
processor decoded and re-encoded the transaction with different encoding rules (either maliciously or unintentionally).
34+
As long as no malleability is present in the signature bytes themselves, clients _should_ query transactions by signature instead of hash.
35+
36+
Not being cognizant of this risk may lead clients to submit the same transaction multiple times if they believe that
37+
earlier transactions had failed or gotten lost in processing.
38+
This could be an attack vector against users if wallets primarily query transactions by hash.
39+
40+
If the state machine were to rely on transaction hashes as a replay mechanism itself, this would be faulty and not
41+
provide the intended replay protection. Instead, the state machine should rely on deterministic representations of
42+
transactions rather than the raw encoding, or other nonces,
43+
if they want to provide some replay protection that doesn't rely on a monotonically
44+
increasing account sequence number.
45+
46+
47+
### Sources of Malleability
48+
49+
#### Non-deterministic Protobuf Encoding
50+
51+
Cosmos SDK transactions are encoded using protobuf binary encoding when they are submitted to the network. Protobuf binary is not inherently a deterministic encoding meaning that the same logical payload could have several valid bytes representations. In a basic sense, this means that protobuf in general can be decoded and re-encoded to produce a different byte stream (and thus different hash) without changing the logical meaning of the bytes. [ADR 027: Deterministic Protobuf Serialization](https://github.com/cosmos/cosmos-sdk/blob/main/docs/architecture/adr-027-deterministic-protobuf-serialization.md) describes in detail what needs to be done to produce what we consider to be a "canonical", deterministic protobuf serialization. Briefly, the following sources of malleability at the encoding level have been identified and are addressed by this specification:
52+
* fields can be emitted in any order
53+
* default field values can be included or omitted, and this doesn't change meaning unless `optional` is used
54+
* `repeated` fields of scalars may use packed or "regular" encoding
55+
* `varint`s can include extra ignored bits
56+
* extra fields may be added and are usually simply ignored by decoders. [ADR 020](https://github.com/cosmos/cosmos-sdk/blob/main/docs/architecture/adr-020-protobuf-transaction-encoding.md#unknown-field-filtering) specifies that in general such extra fields should cause messages and transactions to be rejected)
57+
58+
When using `SIGN_MODE_DIRECT` none of the above malleabilities will be tolerated because:
59+
* signatures of messages and extensions must be done over the raw encoded bytes of those fields
60+
* the outer tx envelope (`TxRaw`) must follow ADR 027 rules or be rejected
61+
62+
Transactions signed with `SIGN_MODE_LEGACY_AMINO_JSON`, however, have no way of protecting against the above malleabilities because what is signed is a JSON representation of the logical contents of the transaction. These logical contents could have any number of valid protobuf binary encodings, so in general there are no guarantees regarding transaction hash with Amino JSON signing.
63+
64+
In addition to being aware of the general non-determinism of protobuf binary, developers need to pay special attention to make sure that unknown protobuf fields get rejected when developing new capabilities related to protobuf transactions. The protobuf serialization format was designed with the assumption that unknown data known to encoders could safely be ignored by decoders. This assumption may have been fairly safe within the walled garden of Google's centralized infrastructure. However, in distributed blockchain systems, this assumption is generally unsafe. If a newer client encodes a protobuf message with data intended for a newer server, it is not safe for an older server to simply ignore and discard instructions that it does not understand. These instructions could include critical information that the transaction signer is relying upon and just assuming that it is unimportant is not safe.
65+
66+
[ADR 020](https://github.com/cosmos/cosmos-sdk/blob/main/docs/architecture/adr-020-protobuf-transaction-encoding.md#unknown-field-filtering) specifies some provisions for "non-critical" fields which can safely be ignored by older servers. In practice, I have not seen any valid usages of this. It is something in the design that maintainers should be aware of, but it may not be necessary or even 100% safe.
67+
68+
#### Non-deterministic Value Encoding
69+
70+
In addition to the non-determinism present in protobuf binary itself, some protobuf field data is encoded using a micro-format which itself may not be deterministic. Consider for instance integer or decimal encoding. Some decoders may allow for the presence of leading or trailing zeros without changing the logical meaning, ex. `00100` vs `100` or `100.00` vs `100`. So if a sign mode encodes numbers deterministically, but decoders accept multiple representations,
71+
a user may sign over the value `100` while `0100` gets encoded. This would be possible with Amino JSON to the extent that the integer decoder accepts leading zeros. I believe the current `Int` implementation will reject this, however, it is
72+
probably possible to encode a octal or hexadecimal representation in the transaction whereas the user signs over a decimal integer.
73+
74+
#### Signature Encoding
75+
76+
Signatures themselves are encoded using a micro-format specific to the signature algorithm being used and sometimes these
77+
micro-formats can allow for non-determinism (multiple valid bytes for the same signature).
78+
Most of the signature algorithms supported by the SDK should reject non-canonical bytes in their current implementation.
79+
However, the `Multisignature` protobuf type uses normal protobuf encoding and there is no check as to whether the
80+
decoded bytes followed canonical ADR 027 rules or not. Therefore, multisig transactions can have malleability in
81+
their signatures.
82+
Any new or custom signature algorithms must make sure that they reject any non-canonical bytes, otherwise even
83+
with `SIGN_MODE_DIRECT` there can be transaction hash malleability by re-encoding signatures with a non-canonical
84+
representation.
85+
86+
#### Fields not covered by Amino JSON
87+
88+
Another area that needs to be addressed carefully is the discrepancy between `AminoSignDoc`(see [`aminojson.proto`](../../x/tx/signing/aminojson/internal/aminojsonpb/aminojson.proto)) used for `SIGN_MODE_LEGACY_AMINO_JSON` and the actual contents of `TxBody` and `AuthInfo` (see [`tx.proto`](../../proto/cosmos/tx/v1beta1/tx.proto)).
89+
If fields get added to `TxBody` or `AuthInfo`, they must either have a corresponding representing in `AminoSignDoc` or Amino JSON signatures must be rejected when those new fields are set. Making sure that this is done is a
90+
highly manual process, and developers could easily make the mistake of updating `TxBody` or `AuthInfo`
91+
without paying any attention to the implementation of `GetSignBytes` for Amino JSON. This is a critical
92+
vulnerability in which unsigned content can now get into the transaction and signature verification will
93+
pass.
94+
95+
## Sign Mode Summary and Recommendations
96+
97+
The sign modes officially supported by the SDK are `SIGN_MODE_DIRECT`, `SIGN_MODE_TEXTUAL`, `SIGN_MODE_DIRECT_AUX`,
98+
and `SIGN_MODE_LEGACY_AMINO_JSON`.
99+
`SIGN_MODE_LEGACY_AMINO_JSON` is used commonly by wallets and is currently the only sign mode supported on Nano Ledger hardware devices
100+
(although `SIGN_MODE_TEXTUAL` was designed to also support hardware devices).
101+
`SIGN_MODE_DIRECT` is the simplest sign mode and its usage is also fairly common.
102+
`SIGN_MODE_DIRECT_AUX` is a variant of `SIGN_MODE_DIRECT` that can be used by auxiliary signers in a multi-signer
103+
transaction by those signers who are not paying gas.
104+
`SIGN_MODE_TEXTUAL` was intended as a replacement for `SIGN_MODE_LEGACY_AMINO_JSON`, but as far as we know it
105+
has not been adopted by any clients yet and thus is not in active use.
106+
107+
All known malleability concerns have been addressed in the current implementation of `SIGN_MODE_DIRECT`.
108+
The only known malleability that could occur with a transaction signed with `SIGN_MODE_DIRECT` would
109+
need to be in the signature bytes themselves.
110+
Since signatures are not signed over, it is impossible for any sign mode to address this directly
111+
and instead signature algorithms need to take care to reject any non-canonically encoded signature bytes
112+
to prevent malleability.
113+
For the known malleability of the `Multisignature` type, we should make sure that any valid signatures
114+
were encoded following canonical ADR 027 rules when doing signature verification.
115+
116+
`SIGN_MODE_DIRECT_AUX` provides the same level of safety as `SIGN_MODE_DIRECT` because
117+
* the raw encoded `TxBody` bytes are signed over in `SignDocDirectAux`, and
118+
* a transaction using `SIGN_MODE_DIRECT_AUX` still requires the primary signer to sign the transaction with `SIGN_MODE_DIRECT`
119+
120+
`SIGN_MODE_TEXTUAL` also provides the same level of safety as `SIGN_MODE_DIRECT` because the hash of the raw encoded
121+
`TxBody` and `AuthInfo` bytes are signed over.
122+
123+
Unfortunately, the vast majority of unaddressed malleability risks affect `SIGN_MODE_LEGACY_AMINO_JSON` and this
124+
sign mode is still commonly used.
125+
It is recommended that the following improvements be made to Amino JSON signing:
126+
* hashes of `TxBody` and `AuthInfo` should be added to `AminoSignDoc` so that encoding-level malleablity is addressed
127+
* when constructing `AminoSignDoc`, [protoreflect](https://pkg.go.dev/google.golang.org/protobuf/reflect/protoreflect) API should be used to ensure that there no fields in `TxBody` or `AuthInfo` which do not have a mapping in `AminoSignDoc` have been set
128+
* fields present in `TxBody` or `AuthInfo` that are not present in `AminoSignDoc` (such as extension options) should
129+
be added to `AminoSignDoc` if possible
130+
131+
## Testing
132+
133+
To test that transactions are resistant to malleability,
134+
we can develop a test suite to run against all sign modes that
135+
attempts to manipulate transaction bytes in the following ways:
136+
- changing protobuf encoding by
137+
- reordering fields
138+
- setting default values
139+
- adding extra bits to varints, or
140+
- setting new unknown fields
141+
- modifying integer and decimal values encoded as strings with leading or trailing zeros
142+
143+
Whenever any of these manipulations is done, we should observe that the sign doc bytes for the sign mode being
144+
tested also change, meaning that the corresponding signatures will also have to change.
145+
146+
In the case of Amino JSON, we should also develop tests which ensure that if any `TxBody` or `AuthInfo`
147+
field not supported by Amino's `AminoSignDoc` is set that signing fails.
148+
149+
In the general case of transaction decoding, we should have unit tests to ensure that
150+
- any `TxRaw` bytes which do not follow ADR 027 canonical encoding cause decoding to fail, and
151+
- any top-level transaction elements including `TxBody`, `AuthInfo`, public keys, and messages which
152+
have unknown fields set cause the transaction to be rejected
153+
(this ensures that ADR 020 unknown field filtering is properly applied)
154+
155+
For each supported signature algorithm,
156+
there should also be unit tests to ensure that signatures must be encoded canonically
157+
or get rejected.
158+
159+
## References
160+
161+
* [ADR 027: Deterministic Protobuf Serialization](https://github.com/cosmos/cosmos-sdk/blob/main/docs/architecture/adr-027-deterministic-protobuf-serialization.md)
162+
* [ADR 020](https://github.com/cosmos/cosmos-sdk/blob/main/docs/architecture/adr-020-protobuf-transaction-encoding.md#unknown-field-filtering)
163+
* [`aminojson.proto`](../../x/tx/signing/aminojson/internal/aminojsonpb/aminojson.proto)
164+
* [`tx.proto`](../../proto/cosmos/tx/v1beta1/tx.proto)
165+

0 commit comments

Comments
 (0)