Significant revision of cask. Remove expiry. Reorder data. Eliminate CASK kind. #43

michaelcfanning · 2025-03-06T23:50:37Z

This significant revision of CASK contains the following changes:

Sensitive data key moved from its current location. As a result of this move, we restore reserved padding bits for sensitive component sizes of 16, 32 and 64 bytes. A 48-byte sensitive component aligns cleanly along a 3-byte boundary and has no padding.
The timestamp is moved to follow the CASK signature. Minutes data is retained.
The expiry is dropped from the core format. This data would be helpful to responders to encode, as by reading the key it would be possible to understand whether the key, as originally allocated was valid in some exposure timeframe. A key CASK principle is for the core format to publish data that is useful for static consumption and not tempting to evaluate at runtime. And so we drop the expiry as too tempting for runtime evaluation.
The CASK key kind is eliminated. Various classes of key, primary key, derived, HMAC can be specified in the provider-specific key kind component.
We now encode the optional data size, as a count of 3-byte optional segments. By encoding this length as well as the sensitive component length in characters at a fixed location relative to the CASK signature QJJQ we enable high-performance detections that start with locating the CASK signature in test/data streams.
The correlating id is dropped to 15 bytes in length. In general, a 16-byte value (128 bits) is preferred in scenarios for a strong guarantee of uniqueness. In practice, 120 bits is sufficient.

The BNF is renamed to CaskSecrets.md to reflect that there's no longer a specific CASK key kind in play. The churn in the BNF was sufficient that rather than renaming first so that we could diff, I simply created a clean copy of the new BNF. But in case anyone finds it helpful to see the explicit diffs from previous, I've left an edited version of the old BNF behind to examine. This will be deleted, obviously, once everything is reviewed/refined/accepted.

This change resolved #35, #36 and #37.

…CASK kind.

michaelcfanning · 2025-03-06T23:57:31Z

docs/PrimaryKey.md

@@ -1,67 +1,83 @@
-# CASK 256-bit Primary Keys
+# CASK Secrets


Secrets

I left these diffs here in case it's helpful to anyone to try to diff from the previous! This file will get deleted next. #Closed

rwoll

A key CASK principle is for the core format to publish data that is useful for static consumption and not tempting to evaluate at runtime. And so we drop the expiry as too tempting for runtime evaluation.

Noice! I appreciate this principle and think it's a good change for the spec. I also appreciate the new disclaimer/caveat on restricting use case of the correlating-id.

Overall looking good! 😊 🚀

Ping me once you publish the PR and I'll review/stamp again if @nguerrera doesn't beat me to it!

rwoll · 2025-03-07T20:31:46Z

docs/CaskSecret.md

+# CASK Secrets
+## Standard Backus-Naur Form (BNF)
+```
+<key> ::= <sensitive-data> <cask-signature> <timestamp> <sensitive-data-size> <optional-data-size> <provider-kind> [<optional-fields>] <provider-signature> <correlating-id>


nit: Thinking out loud… Should the spec switch the ordering so <sensitive-data-size> comes before timestamp? Locating the size as close as possible to the cask-signature and the sensitive-data, for some unarticuable reason, feels more natural?

I don't have a strong technical argument to make the change though, so it's def a nit. #Pending

Yes, we could do this. The unfortunate rough edge of the format is that there is a single element of the timestamp that leaks across the next 4-char encoded boundary.

Today, YMDHM precedes the next three encoded chars representing sensitive data size, optional data size and provider key kind. We made this choice because it effectively puts the provider key kind as a prefix to the provider signature, and that's very nice for regexes, other evaluation. So BAZST, CAZST, etc.

If we just said, well, let's do both, then we could have sensitive data size, optional data size, year and month in the first 3-byte segment, then day, hour, minute, provider key kind in the second. This PR already shows that these 6 bytes are best processed as a group, and the pattern here (I think) would be persistent even if we took this change, you'd just adjust the ordering of the various encoded/decoded char values.

Here's a bunch of blah blah and a proposal on this change, see what you think.

#46

docs/CaskSecret.md

…ary code coverage review completed. Detailed first pass review not compete.

nguerrera · 2025-03-10T17:02:27Z

A small note on the PR description about something that came up during our chat that is very subtle and worth recording now before we forget about it.

By encoding this length as well as the sensitive component length in characters at a fixed location relative to the CASK signature QJJQ we enable high-performance detections that start with locating the CASK signature in test/data streams.

I now believe scanning performance is only a secondary benefit. We could probably get good enough performance using only some well-known max lengths.

I think the more fundamental benefit is being able to determine where a secret starts and ends without any dependency on the data before or after the secret. For scenarios like binary and memory dump scanning, this could be critical.

#Resolved

michaelcfanning · 2025-03-10T22:16:07Z

.editorconfig

@@ -1,7 +1,7 @@
 root = true
+charset = utf-8


charset

We appear to have a file with a UTF8 BOM that is triggering the issue below. We can work around this for now by setting UTF8 as a default encoding (which means our mixture of UTF8 BOM + ASCII works). Alternately we could chase down the BOM and zap it (it may be in a project file, the issue below references BOMs being inserted in some project scenarios).

dotnet/sdk#46780 #Pending

I will look into this.

That baton is yours.

src/Cask/CaskKey.cs

michaelcfanning · 2025-03-10T23:35:02Z

src/Cask/Helpers.cs


-    public static byte ProviderKindToByte(char providerKind)


ProviderKindToByte

In general, this proposed update to the format looks like a nice simplification once rendered in code. Many helpers went away. The test pattern for decoding without any gee whiz construction logic looks acceptably clean/simple. #Pending

I think tiny helpers that do things like x - 'A' , * 3 and casts are still applicable and would still be nice to use. I find comments like // 'A' == index 0 of all printable base64-encoded characters. in a larger method like IsCaskBytes to be a hint to extract a helper, same as seeing 15 as a hint to extract a constant.

Agree! Pending future PR.
#47

michaelcfanning · 2025-03-10T23:37:18Z

src/Cask/InternalConstants.cs

-    /// <summary>
-    /// A bit mask to obtain the reserved bits from the key kind.
-    /// </summary>
-    public const int CaskKindReservedMask = (1 << CaskKindReservedBits) - 1;


CaskKindReservedMask

I've deleted a ton of constants and some exception helpers, based on code coverage and the general thought that anything that exists should have an explicit utilization in code.

I am not sure on the best way to replace our index/range constants. After the design update, all of this stuff needs to be computed dynamically. Basically, you first need to understand the size of the sensitive data component. This gives you the offset of the fixed cask signature, from which you can compute all remaining offsets.

#Resolved

We can still have constants for lengths. For example, I saw a wild 15 that could be CorrelationIdLengthInBytes. Please check for others.

Noted in #47

src/Cask/Polyfill.cs

…ion to some polyfill classes).

michaelcfanning · 2025-03-10T23:50:06Z

src/Tests/Cask.Tests/ICask.cs

@@ -25,8 +25,8 @@ public interface ICask

    string GenerateKey(string providerSignature,


GenerateKey

It's nice to no longer repeated parameter types (string followed by string in particular) to lead to unexpected ordering issues...

Here's a new FxCop rule: on observing any two sequential argument types which are the same, require that all call sites use named arguments to distinguish them (and raise visibility on calling mistakes). #Resolved

michaelcfanning · 2025-03-10T23:51:43Z

Yes, I copied this note into my bundle of comments that will support the spec draft, coming soon.

In reply to: 2711260568

… security sensitive purposes.

nguerrera

I have not a completed a full review pass yet. Saving my comments so far. There are some high-level things that I think are worth discussing and I don't think that discussion needs to wait for me to review all the details.

nguerrera · 2025-03-11T14:05:11Z

docs/CaskSecret.md

@@ -0,0 +1,92 @@
+# CASK Secrets


Please update the pseudo-code as well. #Resolved

Thank you, yes, done.

nguerrera · 2025-03-11T14:12:59Z

src/Cask/Cask.cs

+         *  128-bit : 45 bytes (18 bytes sensitive + 27 reserved) : 12 bytes of optional data permissible < (60 - 45)
+         *  256-bit : 60 bytes (33 bytes sensitive + 27 reserved) : 12 bytes of optional data permissible < (75 - 60)
+         *  384-bit : 75 bytes (48 bytes sensitive + 27 reserved) : 15 bytes of optional data permissible < (93 - 75)
+         *  512-bit : 93 bytes (66 bytes sensitive + 27 reserved) : 15 bytes (value chosen to align with 384 bit keys)


Is 15 bytes actually allowed at all in practice? There is still a limit constant and it's set fixed to 12?

OK, so my reply is that we enforce a 12 byte constraint today but that, if true, that means there is a specific 384-bit key in code that's forbidden where we have space for it. Let me go add a test and look for that bug,.

I've gone back and forth: should we permit the extra 3 bytes in 384 just because we can? if we do that, we should we allow 15 bytes in the longer key because it's weird to drop back the permissible optional data length for this one? or should we just say, 12 bytes only because that's simplest to explain?

I will add explicit tests that ensure we cover our intent thoroughly. Please let me know what you think that intent should be.

12 bytes max, period. That's my preference. 😊

ok! I will make it so. I can't resist noting here, that if we want the simplification of limiting all optional data to 12 bytes, that opens the possibility of us injecting 3 bytes of padding into the 384 bit key. the result of that is that we would always have at least 8 bits of reserved space in the format for some future use. it comes at the cost of artificially creating padding for the 384 bit key unnecessarily. that key is already long, the space wouldn't kill us. but adding this padding would be an unnecessary additional wrinkle/requirement we could avoid. thoughts?

nguerrera · 2025-03-11T14:57:58Z

src/Cask/Cask.cs

+        return InferSensitiveDataSizeFromByteLength(lengthInBytes);
+    }
+
+    internal static SensitiveDataSize InferSensitiveDataSizeFromByteLength(int lengthInBytes)


Hmm. I was expecting explicit lengths would prevent the need for this. But that doesn't actually work with the lengths placed in the middle of variable length things to either side. :(

If I were implementing this spec, when I got to this part, I would say, "What? Why are the lengths in the middle?"

Was putting the CASK signature at the front rather than the middle considered, and if so, why was it rejected? (This choice is carried over from prior formats and before my time so I genuinely don't know.)

If we had QJJQ at the front, followed by sizes, it would look just like a typical binary format where there's a header that starts with a signature and gives you info that you need to efficiently jump around the payload.
#Pending

More thoughts than you wanted on this idea, let's discuss after this review.
#48

nguerrera · 2025-03-11T15:06:12Z

docs/CaskSecret.md

+                                                                            ; specify a particular cryptographic algorithm or method for
+                                                                            ; generating it. The size of this component must conform to the
+                                                                            ; encoded <sensitive-data-size> value.
+<128-bits-padded>  ::= 21 * <base64url> <base64-four-zeros-suffix> 'AA'     ; The total random data comprises 128 bits encoded as 21


Suggested change

<128-bits-padded> ::= 21 * <base64url> <base64-four-zeros-suffix> 'AA' ; The total random data comprises 128 bits encoded as 21

<128-bits-padded> ::= 21 * <base64url> <base64-four-zeros-suffix> 'AA' ; The total sensitive data comprises 128 bits encoded as 21

``` #Resolved

I'm not going to comment on all of them, but there are several more occurrences of random that should be sensitive.

nguerrera · 2025-03-11T15:25:08Z

src/Cask/Cask.cs

-        // Sensitive component size.
-        key[SecretEntropyInBytes] = (byte)SensitiveDataSize.Bits256;
+        Index endOfSensitiveComponent = sensitiveDataSizeInBytes - intPaddingBytes;
+        FillRandom(key[..endOfSensitiveComponent]);


Apparently, the explicit writing of padding was removed in the previous change. I was surprised to not see it here.

We must not assume that stackalloc'ed memory is zero-ed out.

https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/operators/stackalloc

The content of the newly allocated memory is undefined.

I think in practice with today's compilers, it will be zeroed out unless SkipLocalsInit is used, but we should not rely on this undocumented behavior.
#Resolved

Oh geez, ok, what could go wrong with undefined initialization behavior, it's worked out great for C++. :)

Haha. The previous explicit padding was removed when I adjusted this code for optional data sizes. I added this in the test decoding but not here, will do it.

nguerrera · 2025-03-11T15:35:42Z

src/Cask/Cask.cs

-    {
-        int bytesWritten;
+        Range caskSignatureByteRange = ComputeSignatureByteRange(sensitiveDataSize);
+        CaskSignatureBytes.CopyTo(key[caskSignatureByteRange]);


I am thinking that it would be easier to follow if generate did not compute ranges, but instead bumped a destination span like a pointer to write each field.

I had done it this way originally but removed it when everything was constant, but now I regret that.

Something like this:

Span<byte> destination = key; // Sensitive data. FillRandom(destination[..SecretEntropyInBytes]); destination = destination[SecretEntropyInBytes..]; // Padding. destination[..paddingBytes].Clear(); destination = destination[paddingBytes..]; // Cask signature. CaskSignatureBytes.CopyTo(destination[..3]); destination = destination[3..]; // etc.

Now I don't need to compute the signature range, just know that I will be positioned at the signature after writing the fields that come before it. This follows the pseudo-code (although that isn't updated yet in this change) that just says "write X to the next N bytes".

#Pending

Useful, let's do this when we fix #47.
#49

nguerrera

Still reviewing, but saving progress.

nguerrera · 2025-03-11T17:38:44Z

src/Cask/Cask.cs

+        SensitiveDataSize sensitiveDataSize = InferSensitiveDataSizeFromCharLength(keyUtf8.Length);
+        Range caskSignatureByteRange = ComputeSignatureCharRange(sensitiveDataSize);
+
+        // Check for CASK signature. "QJJQ" base64-decoded.


This comment change is incorrect. We are examing base64-encoded data in UTF-8. #Resolved

nguerrera · 2025-03-11T17:40:31Z

src/Cask/Cask.cs

-        // Check for CASK signature, "QJJQ" UTF-8 encoded.
-        if (!keyUtf8[CaskSignatureCharRange].SequenceEqual(CaskSignatureUtf8))
+        SensitiveDataSize sensitiveDataSize = InferSensitiveDataSizeFromCharLength(keyUtf8.Length);
+        Range caskSignatureByteRange = ComputeSignatureCharRange(sensitiveDataSize);


Maybe just name this caskSignatureRange. It is a byte range, but usually wen we are talking about byte ranges we are talking about the base64-decoded form, not UTF-8. That may have cause the confusion that led to the comment change below, or maybe it was copy/paste. #Resolved

caskSignatureUtf8Range

nguerrera · 2025-03-11T17:43:57Z

src/Cask/Cask.cs

-        // Check that kind is valid.
-        if (!TryByteToKind(keyBytes[CaskKindByteIndex], out CaskKeyKind kind) ||
-            kind is not CaskKeyKind.PrimaryKey and not CaskKeyKind.DerivedKey and not CaskKeyKind.HMAC)
+        int sensitiveDataSizeInBytes = RoundUpTo3ByteAlignment((int)encodedSensitiveDataSize * 16);


Maybe name this paddedSensitiveDataSizeInBytes or alignedSensitiveDataSizeInBytes #Resolved

in other code, i created a variable for entropySizeInBytes then used sensitiveDataSizeInBytes (because our definition for sensitive data size includes the padding.

i am nit-picking, but sensitive data size = padded entropy by definition. i will add the additional local and then you tell me what you think about this not very important topic. :)

nguerrera · 2025-03-11T17:44:57Z

src/Cask/Cask.cs

+        }
+
+        int providerDataLengthInBytes = (minutesSizesAndKeyKindChars[2] - 'A') * 3;
+        if (providerDataLengthInBytes > MaxProviderDataLengthInBytes || providerDataLengthInBytes % 3 != 0)


Use Is3ByteAligned helper #Resolved

src/Cask/Cask.cs

nguerrera · 2025-03-11T17:49:47Z

src/Cask/Cask.cs

+        int keyLengthInBytes = GetKeyLengthInBytes(providerDataLengthInBytes, sensitiveDataSize);
+
+        int entropyInBytes = (int)sensitiveDataSize * 16;
+        int sensitiveDataSizeInBytes = RoundUpTo3ByteAlignment(entropyInBytes);


Here too I think it would be good to put padded or aligned in the name. I'm quickly losing track of the difference between these variables as I read on. #Resolved

nguerrera · 2025-03-11T17:50:22Z

src/Cask/Cask.cs

-
-        // Sensitive component size.
-        key[SecretEntropyInBytes] = (byte)SensitiveDataSize.Bits256;
+        Index endOfSensitiveComponent = sensitiveDataSizeInBytes - intPaddingBytes;


I think this is the same as entropyInBytes, which could be used directly below. #Resolved

nguerrera · 2025-03-11T17:55:37Z

src/Cask/InternalConstants.cs

-    /// <summary>
-    /// A bit mask to obtain the reserved bits from the key kind.
-    /// </summary>
-    public const int CaskKindReservedMask = (1 << CaskKindReservedBits) - 1;


We can still have constants for lengths. For example, I saw a wild 15 that could be CorrelationIdLengthInBytes. Please check for others.

src/Cask/CaskKey.cs

nguerrera · 2025-03-11T18:03:24Z

src/Cask/Helpers.cs


-    public static byte ProviderKindToByte(char providerKind)


I think tiny helpers that do things like x - 'A' , * 3 and casts are still applicable and would still be nice to use. I find comments like // 'A' == index 0 of all printable base64-encoded characters. in a larger method like IsCaskBytes to be a hint to extract a helper, same as seeing 15 as a hint to extract a constant.

nguerrera

OK, I've made a full review pass now.

src/Cask/Cask.cs

nguerrera · 2025-03-11T18:39:02Z

src/Cask/Cask.cs

+    {
+        /* 
+         *  Required CASK encoded data, 30 bytes.
+         *  QJJQ YMDH MLOP TEST 1234 1234 1234 1234 1234 1234


I think this has 3 bytes/4 chars too many in C3ID and the total should be 27 bytes, not 30. 27 reserved in comment below doesn't match 30 above.

It would be nice to put these in constants, like Min(256|384|512Bit)KeyLengthInBytes.

#Resolved

Woof, yes, this area moved around a little. Fixed, thanks.

nguerrera · 2025-03-11T18:45:23Z

src/Cask/Cask.cs

+            return SensitiveDataSize.Bits256;
+        }
+
+        return SensitiveDataSize.Bits128;


Consider switch expression:

return lengthInBytes switch { >= Min512BitKeyLengthInBytes => SensitiveDataSize.Bits512, >= Min384BitKeyLengthInBytes => SensitiveDataSize.Bits384, >= Min256BitKeyLengthInBytes => SensitiveDataSize.Bits256, _ => SensitiveDataSize.Bits128, }; #Resolved

nguerrera · 2025-03-11T18:51:14Z

src/Cask/Helpers.cs

-        const int uppercaseZIndex = 'Z' - 'A';
-        const int lowercaseZIndex = 'z' - 'a';
+        int entropyInBytes = (int)sensitiveDataSize * 16;
+        int sensitiveDataSizeInBytes = RoundUpTo3ByteAlignment(entropyInBytes);


I think we should no longer refer to entropy unless we are generating a random key. Again, the real distinction between these two is padded vs. unpadded and the names are making that hard to see. #Resolved

Fixed everywhere. Also changed many names from 'sensitiveDataSize' to 'secretSize'. Arg. Why did I include that rename here? :P

nguerrera · 2025-03-11T18:56:35Z

src/Cask/Helpers.cs

-            kind = default;
-            return false;
-        }
+        keyLengthInBytes += sensitiveDataSizeInBytes;


I also find the mutation of keyLengthInBytes confusing. I guess this is to save some horizontal space, but there's no real reason to have an intermediate partial keyLengthInBytes. #Resolved

simplified.

src/Cask/SensitiveDataSize.cs

nguerrera · 2025-03-11T19:09:00Z

src/Cask/SensitiveDataSize.cs

+
+    /// <summary>
+    /// Specifies a computed value with one 16-byte
+    /// (128 bit) segment of entropy.


s/entropy/sensitive data/

Many instances of this in this file, but I will only comment once. #Resolved

nguerrera · 2025-03-11T19:12:12Z

src/Tests/Cask.Tests/CaskKeyTests.cs

-    }
+    internal static SensitiveDataSize[] AllSensitiveDataSizes => [
+        SensitiveDataSize.Bits128, SensitiveDataSize.Bits256,
+        SensitiveDataSize.Bits384, SensitiveDataSize.Bits512];


nit, I'd put each of these on their own line. #Resolved

src/Tests/Cask.Tests/CaskSecretsTests.cs

nguerrera · 2025-03-11T19:15:54Z

.editorconfig

@@ -1,7 +1,7 @@
 root = true
+charset = utf-8


I will look into this.

Co-authored-by: Nick Guerrera <[email protected]>

…cask-update

…py' as a term.

nguerrera

Review of pseudo-code only. Now reviewing the rest since last review.

nguerrera · 2025-03-12T18:18:16Z

docs/GenerateKeyPseudoCode.md

    - Provider data (if non-empty) consists entirely of base64url printable characters.
+    - Secret data size is between 1 (indicating a 128-bit component) and 4 (indicating 512-bits).


Not listed as an input. #Resolved

Nuts and expiry not deleted. Let's fix this before checking in.

nguerrera · 2025-03-12T18:25:12Z

docs/GenerateKeyPseudoCode.md

    - Provider data (if non-empty) consists entirely of base64url printable characters.
+    - Secret data size is between 1 (indicating a 128-bit component) and 4 (indicating 512-bits).


We can discuss separately from this PR, but I think we should consider making this a size in bytes or bits. And then specify the required alignment and min/max to validate. If someone decided to implement this without an enum, it would be really weird and hard to use IMHO. #Resolved

More accurately, this value is a count of 16-byte segments in the secret data. Just as our optional data value is a count of 3-byte segments. I updated the language. Glad to continue discussing this.

nguerrera · 2025-03-12T18:26:02Z

docs/GenerateKeyPseudoCode.md

 1. Let N = the length of the base64url-decoded provider data.
    - Number of characters in provider data divided by 4, times 3.
+1. Compute the sensitive data size in bytes:
+    - Multiply the secret data size by 16.
+    - Round this value up to the next multiple of 3, if necessary.


You can leave this for now, but I'm finding it hard to understand that this does not mean to discard the unrounded value, that you actually do need later. Maybe it would be better to specify how to compute how many bytes to pad separately. #Pending

docs/GenerateKeyPseudoCode.md

nguerrera · 2025-03-12T18:36:07Z

src/Cask/Cask.cs

+        return InferSensitiveDataSizeFromByteLength(lengthInBytes);
+    }
+
+    private static SecretSize InferSensitiveDataSizeFromByteLength(int lengthInBytes)


nit: rename here too?

Suggested change

private static SecretSize InferSensitiveDataSizeFromByteLength(int lengthInBytes)

private static SecretSize InferSecretSizeFromByteLength(int lengthInBytes)

michaelcfanning added 4 commits March 6, 2025 15:37

Significant revision of cask. Remove expiry. Reorder data. Eliminate …

68edd4d

…CASK kind.

Add Cask secret BNF

ff1dc15

Generalize BNF name to CASK secrets.

df164cc

Add old PrimaryKey.md to facilitate diff reviews

cb32387

michaelcfanning commented Mar 6, 2025

View reviewed changes

rwoll reviewed Mar 7, 2025

View reviewed changes

michaelcfanning added 3 commits March 7, 2025 17:41

Significant update with test changes. Much tuning remains.

9c4ed63

All tests pass. Detailed encoding/decoding example authored. Prelimin…

180c7a0

…ary code coverage review completed. Detailed first pass review not compete.

Update comments.

79d290e

michaelcfanning added 3 commits March 10, 2025 11:49

Merge from main.

2c511d4

All tests pass. 100% code coverage. On to final pre-review sanity check.

837af3a

Formatting changes.

465c500