Ledger Metadata Storage #1678

tamirms · 2025-03-11T23:55:58Z

tamirms
Mar 11, 2025
Maintainer

Discussion for #1677

Simple Summary

A standard for how LedgerCloseMeta
objects can be stored so that ledgers can be easily and efficiently ingested by downstream systems.

Dependencies

None.

Motivation

galexie is a service which publishes
LedgerCloseMeta XDR objects to a GCS
(Google Cloud Storage) bucket. However, the data format and layout of the XDR objects are not formally documented. This
SEP aims to provide a comprehensive specification for storing LedgerCloseMeta objects, enabling third-party developers
to build compatible data stores and clients for retrieving ledger metadata.

Specification

The data store is a key-value store where:

Keys are strings following a specific hierarchical format.
Values are binary blobs representing compressed LedgerCloseMetaBatch XDR values.

The key-value store must support:

Efficient random access lookups on arbitrary keys.
Listing keys in lexicographic order, optionally filtered by a prefix.

Examples of compatible key-value stores include Google Cloud Storage (GCS) and Amazon S3.

Value Format

Each value in the key-value store is the Zstandard compressed binary encoding of
the following XDR structure:

// Batch of ledgers along with their transaction metadata
struct LedgerCloseMetaBatch
{
    // starting ledger sequence number in the batch
    uint32 startSequence;

    // ending ledger sequence number in the batch
    uint32 endSequence;

    // Ledger close meta for each ledger within the batch
    LedgerCloseMeta ledgerCloseMetas<>;
};

A LedgerCloseMetaBatch represents a contiguous range of one or more consecutive ledgers.
All batches in a data store instance contain the same number of ledgers.
Currently only Zstandard compression is supported but it is possible to extend
the SEP in the future to allow other compression algorithms.

Key Format

Keys follow a hierarchical directory structure. The root directory is /ledgers, and subdirectories represent
partitions. Each partition contains a fixed number of batches:

/ledgers/<partition>/<batch>.xdr.zst

If the partition size is 1, the partition is omitted, resulting in:

/ledgers/<batch>.xdr.zst

Partition Format:

fmt.Sprintf("%08X--%d-%d/", math.MaxUint32-partitionStartLedgerSequence, partitionStartLedgerSequence, partitionEndLedgerSequence)

Batch Format:

 fmt.Sprintf("%08X--%d-%d.xdr.zst", math.MaxUint32-batchStartLedgerSequence, batchStartLedgerSequence, batchEndLedgerSequence)

If the batch size is 1, the format simplifies to:

 fmt.Sprintf("%08X--%d.xdr.zst", math.MaxUint32-batchStartLedgerSequence, batchStartLedgerSequence)

Note the .zst suffix is the filename extension defined in the Zstandard
RFC. If this SEP is extended to support another compression algorithm
then the standard filename extension for the given compression algorithm will be used as a suffix in the batch name.

Configuration File

The data store includes a configuration JSON object stored under the key /config.json. This file contains the
following properties:

networkPassphrase - (string) the passphrase for the Stellar network associated with the ledgers.
compression - (string) the compression algorithm used to compress ledger objects (currently only
zstd is supported).
ledgersPerBatch - (integer) the number of ledgers bundled into each LedgerCloseMetaBatch.
batchesPerPartition - (integer) the number of batches in a partition.

Example Configuration:

{
  "networkPassphrase": "Public Global Stellar Network ; September 2015",
  "compression": "zstd",
  "ledgersPerBatch": 2,
  "batchesPerPartition": 8
}

Example Key Structure

Below is an example list of keys for ledger batches based on the configuration above:

/ledgers/FFFFFFEF--16-31/FFFFFFED--18-19.xdr.zst
/ledgers/FFFFFFEF--16-31/FFFFFFEF--16-17.xdr.zst
/ledgers/FFFFFFFF--0-15/FFFFFFF1--14-15.xdr.zst
/ledgers/FFFFFFFF--0-15/FFFFFFF3--12-13.xdr.zst
/ledgers/FFFFFFFF--0-15/FFFFFFF5--10-11.xdr.zst
/ledgers/FFFFFFFF--0-15/FFFFFFF7--8-9.xdr.zst
/ledgers/FFFFFFFF--0-15/FFFFFFF9--6-7.xdr.zst
/ledgers/FFFFFFFF--0-15/FFFFFFFB--4-5.xdr.zst
/ledgers/FFFFFFFF--0-15/FFFFFFFD--2-3.xdr.zst

Note: The genesis ledger starts at sequence number 2, so the oldest batch must have a batchStartLedgerSequence
of 2.

Design Rationale

Key Encoding (Reversed Ledger Sequence)

Lexicographic Order: Many key-value stores (e.g., GCS, S3) optimize for listing keys in lexicographic order. By
encoding the most recent ledgers first, clients can efficiently retrieve the latest data without scanning the entire
dataset.
Reversed Sequence: Using math.MaxUint32 - startLedger ensures that newer ledgers (with higher sequence numbers)
appear before older ones when sorted lexicographically. This avoids the need for additional metadata or indexes to
determine the latest ledger.

Compression Algorithm

zstd was chosen after evaluating zstd, lz4, and gzip. It provides the best balance between compression ratio
and decompression speed.

Security Concerns

Verifying the validity of the ledgers contained within the data store is outside the scope of this SEP. In otherwords,
this SEP does not provide any mechanism for validating that the ledgers obtained from a data store have not been
altered.

tamirms · 2025-03-12T09:05:28Z

tamirms
Mar 12, 2025
Maintainer Author

@urvisavla pointed out the following differences between the spec and what we have implemented in galexie:

galexie does not have a fixed prefix (/ledgers) for the root directory
galexie includes a .zstd suffix on the batch keys (e.g. /ledgers/<partition>/<batch>.xdr.zstd)

The reason I did not include the .zstd suffix is that it is derived from the compression algorithm. If we supported another compression algorithm then the suffix would need to be different. We could include a suffix property in the /config.json JSON object. However, I thought it would be better to have less configuration knobs if possible.

Regarding the root directory, I thought it would be useful to have a ledgers directory to separate the ledger keys from the config key.

But I am open to feedback on both these points.

5 replies

leighmcculloch Mar 12, 2025
Maintainer

Separating the config from the ledgers directory makes sense to me and on some systems may make it easier to list the root and see the config, that otherwise might be lost in a slow need to paginate.

Using a .zstd extension to clearly signal that the contents are not xdr, but compressed xdr, seems reasonable to me. The use of extensions to communicate encoding is common place and worth preserving. Files get downloaded, moved around, and put elsewhere. Whilst it may be fine to assume everyone knows the .xdr files are compressed when stored in this place, when they are moved to other places the .zstd will carry the communication about their encoding.

Whilst I think this is unlikely, and I wouldn't make a decision on this alone, the .zstd extension also makes it possible to support multiple compression strategies, or to change compression strategy over time with overlap, should that need to occur.

tamirms Mar 12, 2025
Maintainer Author

@leighmcculloch do you think we should introduce a new field in the config object to indicate the extension on the batch keys will be .zstd? Or should we communicate that in the SEP itself (e.g. if compression is set to zstd then the extension will be zstd)?

urvisavla Mar 12, 2025
Maintainer

Also, adding the extension makes it clear the files are compressed and reinforces the recommendation to store them that way as storing them as uncompressed xdr would be very inefficient.

leighmcculloch Mar 12, 2025
Maintainer

I think it'd be sufficient for the SEP to specify a mapping of compression algorithms to extensions. But, if you'd like to avoid needing to update the SEP with other compression algorithms and let folks use whatever they want, then yeah a mapping in the config could be a good idea. But defining it in the SEP is likely to result in more consistency?

tamirms Mar 12, 2025
Maintainer Author

I updated the SEP to include the zstandard file name extension but it turns out the filename extension defined for zstandard is .zst, not .zstd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stellar

Ledger Metadata Storage #1678

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 5 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Stellar

Ledger Metadata Storage #1678

tamirms Mar 11, 2025 Maintainer

Simple Summary

Dependencies

Motivation

Specification

Value Format

Key Format

Partition Format:

Batch Format:

Configuration File

Example Configuration:

Example Key Structure

Design Rationale

Key Encoding (Reversed Ledger Sequence)

Compression Algorithm

Security Concerns

Replies: 1 comment · 5 replies

tamirms Mar 12, 2025 Maintainer Author

leighmcculloch Mar 12, 2025 Maintainer

tamirms Mar 12, 2025 Maintainer Author

urvisavla Mar 12, 2025 Maintainer

leighmcculloch Mar 12, 2025 Maintainer

tamirms Mar 12, 2025 Maintainer Author

tamirms
Mar 11, 2025
Maintainer

Replies: 1 comment 5 replies

tamirms
Mar 12, 2025
Maintainer Author

leighmcculloch Mar 12, 2025
Maintainer

tamirms Mar 12, 2025
Maintainer Author

urvisavla Mar 12, 2025
Maintainer

leighmcculloch Mar 12, 2025
Maintainer

tamirms Mar 12, 2025
Maintainer Author