Skip to content

Flowlogs: make calico-node to fetch flow logs from a node #10144

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 38 commits into from
Apr 17, 2025

Conversation

mazdakn
Copy link
Member

@mazdakn mazdakn commented Apr 4, 2025

Description

Add a new command to calico-node binary to list or watch flow logs from a specific node. This is very useful for troubleshooting. The command line will be:

calico-node -flows [n]

if n is positive then it will watch for n logs and exits. If n is negative, then it watch for flowlogs for ever until interrupted.

In a cluster, it can be used as the following to watch for flowlogs for ever:

kubectl exec -ti [node-name] -n [node-namespace] -- calico-node -flows -1

Sample output:

- Time=2025-04-11 21:50:58 +0000 UTC Reporter=Src Action=Allow
  Src={wep(default/loadgenerator-78cb8f69c8-*)} Dst={wep(kube-system/coredns-6d4b75cb6d-*) Svc:kube-dns/kube-system} Proto={udp(53) Svc:dns/53}
  Counts={Ingress: 12Pkts/1818Bytes Egress:12Pkts/960Bytes} Connections={Started:6 Completed:0 Live:6}
  Enforced:
  - kind:CalicoNetworkPolicy namespace:"default" name:"allow-dns-egress" tier:"default" action:Allow
  Pending:
  - kind:CalicoNetworkPolicy namespace:"default" name:"allow-dns-egress" tier:"default" action:Allow
- Time=2025-04-11 21:50:58 +0000 UTC Reporter=Src Action=Deny
  Src={wep(default/loadgenerator-78cb8f69c8-*)} Dst={net(-/pvt) Svc:-/-} Proto={tcp(80) Svc:-/0}
  Counts={Ingress: 0Pkts/0Bytes Egress:27Pkts/1620Bytes} Connections={Started:6 Completed:0 Live:6}
  Enforced:
  - kind:EndOfTier tier:"default" action:Deny rule_index:-1 trigger:{kind:CalicoNetworkPolicy namespace:"default" name:"loadgenerator-frontend-policy" tier:"default"}
  Pending:
  - kind:EndOfTier tier:"default" action:Deny rule_index:-1 trigger:{kind:CalicoNetworkPolicy namespace:"default" name:"allow-dns-egress" tier:"default"}

To enable reporting flow logs to local node socket, a new Felix config, named flowLogsLocalReporter is introduced, which can be set to either Disabled (default) or Enabled.

Related issues/PRs

Todos

  • Tests
  • Documentation
  • Release note

Release Note

TBD

Reminder for the reviewer

Make sure that this PR has the correct labels and milestone set.

Every PR needs one docs-* label.

  • docs-pr-required: This change requires a change to the documentation that has not been completed yet.
  • docs-completed: This change has all necessary documentation completed.
  • docs-not-required: This change has no user-facing impact and requires no docs.

Every PR needs one release-note-* label.

  • release-note-required: This PR has user-facing changes. Most PRs should have this label.
  • release-note-not-required: This PR has no user-facing changes.

Other optional labels:

  • cherry-pick-candidate: This PR should be cherry-picked to an earlier release. For bug fixes only.
  • needs-operator-pr: This PR is related to install and requires a corresponding change to the operator.

@mazdakn mazdakn added docs-not-required Docs not required for this change release-note-not-required Change has no user-facing impact labels Apr 4, 2025
@mazdakn mazdakn requested a review from a team as a code owner April 4, 2025 00:37
@marvin-tigera marvin-tigera added this to the Calico v3.31.0 milestone Apr 4, 2025
Copy link
Member

@caseydavenport caseydavenport left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass, want to take a closer look yet

@@ -48,6 +53,9 @@ func NewReporter(addr, cert, key, ca string) (*GoldmaneReporter, error) {
return &GoldmaneReporter{
address: addr,
client: cli,

// Do not send flowlogs to node socket, if goldmane address set via FelixConfig is equal to node socket
mayReportToNodeSocket: addr != NodeSocketAddress,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much work would it be to lift this out of the Goldmane reporter and into its own Reporter, and then adding the ability to run multiple reporters + dynamically add / remove them?

It seems to me that we are going to want the ability to have multiple Reporters running at once - it would be cool to be able to use this e.g., even if Goldmane wasn't configured.

Copy link
Member Author

@mazdakn mazdakn Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not much! Collector already supports multiple reporters.
But if we want to be able to run this without goldmane enabled, we should introduce a config option to enabled/disable it. Something like FlowLogsGoldmaneNodeServer with Enabled and Disabled values. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a preferrable route to me, if it's not too much work. A new LocalFlowSocket: Enabled or something would make sense as the config option name probably?

mazdakn and others added 2 commits April 14, 2025 09:47
@mazdakn mazdakn requested a review from caseydavenport April 14, 2025 18:20
@tomastigera
Copy link
Contributor

tomastigera commented Apr 15, 2025

calico-node --flows I though that calico-node used a single dash. All the other options have a single dash.

@mazdakn mazdakn requested a review from caseydavenport April 16, 2025 00:12
@mazdakn
Copy link
Member Author

mazdakn commented Apr 16, 2025

@tomastigera you are right. It's with one dash.

@@ -835,6 +835,10 @@ type FelixConfigurationSpec struct {
// FlowLogGoldmaneServer is the flow server endpoint to which flow data should be published.
FlowLogsGoldmaneServer *string `json:"flowLogsGoldmaneServer,omitempty"`

// FlowLogsLocalSocket configures local unix socket for reporting flow data from each node. [Default: Disabled]
// +kubebuilder:validation:Enum=Disabled;Enabled
FlowLogsLocalSocket *string `json:"flowLogsLocalSocket,omitempty"`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should name this FlowLogsLocalReporter or FlowLogsNodeReporter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense to me - I like the Local terminology rather than Node myself.

Copy link
Member

@caseydavenport caseydavenport left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mazdakn I think this looks good, but I'd like to clean up some terminology and structure a bit - namely there are references to both NodeReporter and LocalReporter and I think we should standardize on the latter.

Also, its referred to as goldmane in a few places but I don't think it is tied to Goldmane any more?

log.Info("Adding Flow Logs Aggregator (allowed) for node socket")
fr.AddAggregator(gaa, []string{FlowLogsLocalReporterName})
log.Info("Creating node socket Aggregator for denied")
gad := flowAggregatorForGoldmane(rules.RuleActionDeny, configParams.FlowLogsCollectorDebugTrace)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still find this a bit confusing - this reporter isn't really for goldmane, rather it's for the node socket, right? Is this just a case of an awkward function name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed it to defaultFlowAggregator

@mazdakn
Copy link
Member Author

mazdakn commented Apr 17, 2025

CI passed in the last run.

@mazdakn mazdakn merged commit 155dc4e into projectcalico:master Apr 17, 2025
3 of 4 checks passed
@mazdakn mazdakn deleted the tool-flowlog branch April 17, 2025 22:17
mazdakn added a commit to mazdakn/calico that referenced this pull request May 7, 2025
@mazdakn mazdakn mentioned this pull request May 7, 2025
3 tasks
@mazdakn mazdakn added cherry-pick-completed PR has been cherry-picked and removed cherry-pick-candidate labels May 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick-completed PR has been cherry-picked docs-not-required Docs not required for this change release-note-not-required Change has no user-facing impact
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants