Skip to content
This repository was archived by the owner on Nov 15, 2023. It is now read-only.
This repository was archived by the owner on Nov 15, 2023. It is now read-only.

Aura alterations for Asynchronous Backing compatibility #2476

Open
@rphmeier

Description

@rphmeier

The bulk of the changes here should also apply to SASSAFRAS.

High Level

(feel free to skip ahead to the proposed modifications section)

As a recap (more details #2301), we need to prepare Cumulus collators for more advanced parachains protocol features such as Asynchronous Backing, Elastic Scaling, and Blockspace Regions. These changes impact the collator logic and protocols in similar ways: they enable more blockspace to be utilized, and they enable this blockspace to be consumed optimistically, based on predictions of what will happen on the relay chain.

Collator-selection algorithms are not responsible for consensus safety (i.e. security) but just consensus liveness (not halting). Liveness is a spectrum, though. From a formal standpoint, consensus algorithms usually focus on guaranteeing some minimal level of liveness, whereas in practice we care about maximizing throughput, minimizing latency, and reducing waste.

To be more specific, we can clarify two desirable properties for a collator-selection algorithm:

  1. Maximize utilization of available blockspace (Goal L1)
  2. Minimize effort in fulfiling L1 (Goal L2)

The phrase "available blockspace" here means something very specific. It refers to the number of cores and the proportion of each of those cores allocated to the parachain. This deliberately ignores the Pay-as-you-Go model, as Aura and similar collator-selection mechanisms are focused on dividing up authorship rights within already allocated blockspace regions, as opposed to Pay-as-you-Go, which is focused on purchasing and consuming blockspace in small increments.

The available blockspace can be roughly condensed into two variables: the maximum capacity C and the expected velocity V. Capacity refers to the amount of blocks that can be pending at any given time. e.g. for 1 core this might be 1 + 4 (the 4 comes from some lookahead performed by Polkadot's validators). The velocity of blockspace refers to the rate of block processing by the relay chain. Roughly speaking, with one core, V = 1, with two cores V = 2 and so on. Velocity calculations are more subtle when we take into account parachains sharing cores with strided blockspace regions.

The term "unincluded segment" (cc #1866) refers to the currently used capacity of the chain. Parachains control the maximum size of the unincluded segment and set C themselves, but the C variable is in practice capped by the relay chain's configuration. The remaining capacity of the unincluded segment is c = C - len(unincluded).

New relay-chain blocks increase c at velocity v <= V (the actual velocity is always less than or equal to the maximum), via the advancement of the backing and availability protocols. That is, the advancement of the relay-chain opens up new capacity to be fulfilled by parachain block authors. New parachain blocks extend the unincluded segment and utilize capacity.

This "capacity"/"velocity" framing may seem overwrought given the current state of blockspace in Polkadot: parachains have 1 core and they are scheduled at every moment on that core. However, with elastic scaling and divisible blockspace regions, it starts to become more reasonable to talk about capacity and velocity values that are much greater than 1 and much smaller than 1.

Proposed Aura modifications

Aura slots in Cumulus are the same as the relay-parent's slot.

  1. Allow block authors to build multiple blocks per slot.
  2. Only allow block authors to build a new block when the unincluded segment would not exceed its maximum
  3. Only allow block authors to build up to V + 1 blocks per slot

All of these checks can be guaranteed by runtime pallets. The velocity can be deduced from the relay-chain state and the maximum unincluded segment length is determined by the parachain's own runtime.

In practice, we need a few changes to support this.

Liveness Analysis

There are 2 states a (parachain parent, relay-parent) pair can be in, and we'll analyze the outcomes at each of them.

c = 0

By the definition of c, this means there is no remaining capacity for new blocks at this relay-parent.
The slot author for any relay-parent where the unincluded segment is totally full cannot author any blocks.

c > 0

The unincluded segment has some remaining capacity c. The slot owner as-of R can create min(V + 1, c) blocks with R as a relay-parent. This increases the utilization of the maximum capacity.

Implications

The goal is to build as long of a backlog of blocks as the parachain will allow (latency/throughput tradeoff). When the backlog is empty, each collator adds 1 extra parachain block per relay-chain block until the backlog is full. When the backlog is full, each collator simply replaces the blocks that have now been included into the relay chain.

This does imply that it takes N relay-chain blocks to fill an empty backlog, but backlogs are expected to be short, e.g. 2-4 blocks, so it will only take 12-24 seconds to fill the backlog completely. This fulfills Goal L1.

Goal L2 is mostly fulfilled by limiting the contention over which collators get to fill up which parts of the backlog. By limiting collators to at most v+1 blocks, the overlap between collators is limited. There are still issues where collators can decide to build on entirely competing forks, but these need to be addressed at the fork-choice and incentive levels.

Metadata

Metadata

Assignees

No one assigned

    Labels

    J1-metaA specific issue for grouping tasks or bugs of a specific category.S1-implementIssue is in the implementation stage.T5-parachainsThis PR/Issue is related to Parachains.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions