Skip to content
This repository was archived by the owner on Jul 9, 2024. It is now read-only.

Implemet beacon set updates #166

Closed
amarthadan opened this issue Jul 17, 2022 · 17 comments · Fixed by #182
Closed

Implemet beacon set updates #166

amarthadan opened this issue Jul 17, 2022 · 17 comments · Fixed by #182
Assignees

Comments

@amarthadan
Copy link
Contributor

I'd say that the implementation will be different enough to be done separately from update-beacons.ts. I would prefer starting with a separate implementation in update-beacon-sets.ts and then refactor both and reuse common parts.

The flow of beacon set update:

  • retrieve values for each beacon within the set from the cache (common memory)
  • IF there's no value for a given beacon, fetch it from the chain
  • IF the value is not available on the chain skip the update
  • fetch beacon set value & timestamp from the chain
  • calculate beacon set timestamp from beacon timestamps (https://github.com/api3dao/airnode-protocol-v1/blob/main/contracts/dapis/DapiServer.sol#L443)
  • IF new timestamp is older than the on-chain one skip the update
  • IF the last update is older than now + heartbeat interval force update
  • IF the deviation threshold is reached do the update, skip otherwise

Use the readDataFeedWithId function to rad the data (both beacon and beacon sets) and updateBeaconSetWithSignedData function to update beacon sets.

@bbenligiray
Copy link
Member

retrieve values for each beacon within the set from the cache (common memory)
IF there's no value for a given beacon, fetch it from the chain

I wonder if we should consider the case where the on-chain Beacon value is more up to date than the one in the cache. I suppose the cache gets cleared every 15 minutes so this is unlikely enough to not care.

IF the value is not available on the chain skip the update

This wouldn't hurt in practice, but there's nothing in the contract that prevents uninitialized on-chain Beacon values from being used to update a Beacon set. Just questioning the point of implementing additional logic.

No comments about the rest, looks good

@amarthadan
Copy link
Contributor Author

I wonder if we should consider the case where the on-chain Beacon value is more up to date than the one in the cache. I suppose the cache gets cleared every 15 minutes so this is unlikely enough to not care.

Hmm, we would have to fetch data from the chain for each beacon then. Not sure it's worth it if it's unlikely. If we would want to do this it would probably make sense to cache the on-chain data when we fetch them (during the beacon update process). I would leave this as a potential improvement for later.

This wouldn't hurt in practice, but there's nothing in the contract that prevents uninitialized on-chain Beacon values from being used to update a Beacon set. Just questioning the point of implementing additional logic.

But the logic is there to prevent beacon set update when we're missing data for one of the beacons in the set (not found either in memory - not fetched from the gateway yet, or on-chain).

@bbenligiray
Copy link
Member

But the logic is there to prevent beacon set update when we're missing data for one of the beacons in the set

What do you mean, if you omit the signature it will use the on-chain value even if it's value:0, timestamp:0

@amarthadan
Copy link
Contributor Author

That's what I want to avoid. If there's no value from the gateway (in cache) and there's no data on-chain (for one of the beacons), we should skip even attempting to update the beacon set, because it will fail, right? I know the contract will read the value from the chain but I'm talking about the case when there's no on-chain data for a given beacon. The contract call would in that case fail and I want to avoid even making that call.

@bbenligiray
Copy link
Member

It will only fail "IF new timestamp is older than the on-chain one", which you would already be checking for, so implementing a specific check here is redundant

@amarthadan
Copy link
Contributor Author

Ok, I understand what you mean. I still think it's worth it though. It's only one more if statement and it will prevent unnecessary on-chain calls (we don't need to fetch beacon set value if we know we won't be able to update it).

@bbenligiray
Copy link
Member

I couldn't come up with a case where how we handle this would be important. It's only a nit for me because it's opinionated behavior (with respect to the contract).

@Siegrift
Copy link
Contributor

Hmm, we would have to fetch data from the chain for each beacon then. Not sure it's worth it if it's unlikely. If we would want to do this it would probably make sense to cache the on-chain data when we fetch them (during the beacon update process). I would leave this as a potential improvement for later.

We could implement a TTL for the cached beacon value even for the in-memory cache (e.g. make it only valid for 5 mins).

I couldn't come up with a case where how we handle this would be important. It's only a nit for me because it's opinionated behavior (with respect to the contract).

It feels strange that the contract allows this, but this will likely not be an issue in practice right? (We will create beacon sets out of already existing beacons which will be initialized). Thus doing the check @amarthadan mentions feels unnecessary to me.

@bbenligiray
Copy link
Member

We could implement a TTL for the cached beacon value even for the in-memory cache (e.g. make it only valid for 5 mins).

Feels redundant with the natural 15 minute cache lifetime

It feels strange that the contract allows this, but this will likely not be an issue in practice right? (We will create beacon sets out of already existing beacons which will be initialized). Thus doing the check @amarthadan mentions feels unnecessary to me.

Running Beacon sets of uninitialized Beacons is a bad practice and we wouldn't do it on purpose. The problem for me is that it's not obvious how it being done accidentally should be handled. If you build a Beacon set of 3 and 1 in uninitialized with no signed data, it will update fine until you figure out the problem (so it's not clear that not updating in this case is the best course of action). If you build a Beacon set of 3 and 2 in uninitialized with no signed data you will report one third of the actual value, which is pretty bad.
I guess the best check is to count the Beacons for which we have no signed data for and are uninitialized, and not update if they are the majority.

@Siegrift
Copy link
Contributor

I guess the best check is to count the Beacons for which we have no signed data for and are uninitialized, and not update if they are the majority.

The "not update if they are the majority" feels also like an ad-hoc requirement. If I have a beacon set out of 5 beacons then according to this rule 2 uninitialized is OK, but 3 not. In practice as a consumer of beacon set value, you would be dissatisfied with both (and maybe you would be OK with it). Another thing is that the timestamp will also be out of date which might break things for the users (e.g. the timestamp could refer to 14th century or so). Can't we disallow updating beacon sets if the beacon is unitialized?

@bbenligiray
Copy link
Member

you would be dissatisfied with both

In general, data feed updates as intended (1) >> data feed doesn't update even thought it should (2) >> data feed updates with the wrong value (3). So it's a matter of choosing 1 & 3 or 2, which is why this is a bit subjective.

Another thing is that the timestamp will also be out of date which might break things for the users

With Beacon sets, the timestamp doesn't represent a timestamp of the data. It's only a value that prevents older values from overwriting newer values. The user shouldn't implement any logic that depends on it.

Can't we disallow updating beacon sets if the beacon is unitialized?

What do you mean, in the contract?

@Siegrift
Copy link
Contributor

In general, data feed updates as intended (1) >> data feed doesn't update even thought it should (2) >> data feed updates with the wrong value (3). So it's a matter of choosing 1 & 3 or 2, which is why this is a bit subjective.

Depends on how you define "intended". But yeah, I get that this is subjective.

With Beacon sets, the timestamp doesn't represent a timestamp of the data. It's only a value that prevents older values from overwriting newer values. The user shouldn't implement any logic that depends on it.

When I look at the implementation of updateBeaconSetWithSignedData the timestamp is an avarage value of the beacon timestamps (and beacon timestamps are the timestamps of the data). That's why I'd consider the beacon set timestamp as an aggregated timestamp of the data. I do not understand why not base contract logic on this timestamp.

What do you mean, in the contract?

Require that this beacon value is not 0.

@bbenligiray
Copy link
Member

I do not understand why not base contract logic on this timestamp.

See this

Require that this beacon value is not 0.

We can't because that's audited and deployed into production (and there are cases where not doing that check and updating anyway would be preferable so it's not necessarily a security issue)

@Siegrift
Copy link
Contributor

See this

Yeah, but seems intended to me. The timestamp is an average of the beacon timestamps. If the avg=100, you don't know if the timestamps are (50, 125, 125) or (99, 100, 101). But I still wouldn't like to use a data feed value when the AVG is too old.

We can't because that's audited and deployed into production

Yeah, I assumed so.

(and there are cases where not doing that check and updating anyway would be preferable so it's not necessarily a security issue)

Seems also like a reason why we should not forbid this in Airseeker.

@bbenligiray
Copy link
Member

But I still wouldn't like to use a data feed value when the AVG is too old.

Median aggregation implies that (50, 125, 125) is perfectly fine, so you should also use a stricter aggregation method then

@Siegrift
Copy link
Contributor

I am confused now. The median is there for value, but timestamp is averaged

@bbenligiray
Copy link
Member

Using median as the aggregation method means you want to be able to get an aggregate even if one of the three values is dishonest, super old, etc. If you have stricter requirements, you should probably not use median but an aggregation method that does better numerical aggregation (because median is notoriously bad at it, but it's used for being maximally robust against outliers).

@acenolaza acenolaza self-assigned this Jul 26, 2022
acenolaza added a commit that referenced this issue Aug 2, 2022
acenolaza added a commit that referenced this issue Aug 2, 2022
acenolaza added a commit that referenced this issue Aug 2, 2022
acenolaza added a commit that referenced this issue Aug 9, 2022
@acenolaza acenolaza self-assigned this Aug 10, 2022
acenolaza added a commit that referenced this issue Aug 15, 2022
acenolaza added a commit that referenced this issue Aug 16, 2022
acenolaza added a commit that referenced this issue Aug 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants