-
Notifications
You must be signed in to change notification settings - Fork 5
Add initial counters for the Dataplane #643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
sergeymatov
commented
Jun 26, 2025
- Add Prometheus exporter (with configurable port for metrics)
- Add batch of counters for Dataplane pipelines
27b0aec
to
5c3ab60
Compare
5c3ab60
to
feefefd
Compare
Default access is 9090/metrics. Signed-off-by: Sergey Matov <[email protected]>
feefefd
to
7f87a74
Compare
Signed-off-by: Sergey Matov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only major concern is all the naked strings. I think refactor will be hard and this is avoidable if we fix it now
})?; | ||
|
||
// Describe core gateway metrics | ||
describe_counter!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally we wouldn't have so many hard coded string running around.
I recommend adding a traits along the lines of
pub trait GatewayCounter {
const NAME: &'static str;
const DESCRIPTION: &'static str;
}
then implementing that trait for a series of zero sized types.
The problem with the naked strings is that it is tricky to refactor or get an exhaustive list of exactly where these counters are used or updated. The type system can help a lot here.
@@ -35,6 +39,92 @@ fn init_logging() { | |||
.init(); | |||
} | |||
|
|||
fn init_metrics(metrics_port: u16) -> Result<(), Box<dyn std::error::Error>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this method should get annotated with
#[tracing::instrument(level = "info")]
@@ -65,13 +67,17 @@ impl Egress { | |||
let Some(our_mac) = interface.get_mac() else { | |||
error!("{nfi}: Failed to get mac address of interface {ifname}!"); | |||
packet.done(DoneReason::InternalFailure); | |||
counter!("gateway_errors_total", "component" => "egress", "error" => "missing_src_mac") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this strategy is ok for the moment, but in a future implementation we should treat the returned Counter objects as state and not regenerate them on every function call
} | ||
|
||
fn interface_ingress<Buf: PacketBufferMut>( | ||
&self, | ||
interface: &Interface, | ||
packet: &mut Packet<Buf>, | ||
) { | ||
let ifname = &interface.name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you clone interface.name
here then you can avoid allocating memory in the follow on invokes of counter!
return; | ||
}; | ||
debug!("{nfi}: processing packet to {dst} with vrf {vrfid}"); | ||
|
||
/* decrement packet TTL -- packet may be done if TTL is exceeded */ | ||
Self::decrement_ttl(packet, dst); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the logic, but
- I think this change is out of scope here
- we don't check that packet is_done because the hop limit is past. Just that the packet is done, so the log can be incorrect. Shouldn't decrement_ttl return an enum for this event?