Skip to content

feat: Add Windows support to retina-shell #1617

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions .github/workflows/images.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -213,6 +213,58 @@ jobs:
fi
env:
IS_MERGE_GROUP: ${{ github.event_name == 'merge_group' }}

retina-shell-win-images:
name: Build Retina Shell Windows Images
runs-on: ubuntu-latest

strategy:
matrix:
platform: ["windows"]
arch: ["amd64"]
year: ["2019", "2022"]

steps:
- name: Checkout code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

- uses: actions/setup-go@d35c59abb061a4a6fb18e82ac0862c26744d6ab5 # v5.5.0
with:
go-version-file: go.mod
- run: go version

- name: Set up QEMU
uses: docker/setup-qemu-action@v3

- name: Az CLI login
uses: azure/login@v2
if: ${{ github.event_name == 'merge_group' }}
with:
client-id: ${{ secrets.AZURE_CLIENT_ID }}
tenant-id: ${{ secrets.AZURE_TENANT_ID }}
subscription-id: ${{ secrets.AZURE_SUBSCRIPTION }}

- name: Build Images
shell: bash
run: |
set -euo pipefail
echo "TAG=$(make version)" >> $GITHUB_ENV
if [ "$IS_MERGE_GROUP" == "true" ]; then
az acr login -n ${{ vars.ACR_NAME }}
make retina-shell-image-win \
IMAGE_NAMESPACE=${{ github.repository }} \
PLATFORM=${{ matrix.platform }}/${{ matrix.arch }} \
IMAGE_REGISTRY=${{ vars.ACR_NAME }} \
WINDOWS_YEARS=${{ matrix.year }} \
BUILDX_ACTION=--push
else
make retina-shell-image-win \
IMAGE_NAMESPACE=${{ github.repository }} \
PLATFORM=${{ matrix.platform }}/${{ matrix.arch }} \
WINDOWS_YEARS=${{ matrix.year }}
fi
env:
IS_MERGE_GROUP: ${{ github.event_name == 'merge_group' }}

kubectl-retina-images:
name: Build Kubectl Retina Images
Expand Down Expand Up @@ -273,6 +325,7 @@ jobs:
retina-win-images,
operator-images,
retina-shell-images,
retina-shell-win-images,
kubectl-retina-images,
]

Expand Down
16 changes: 16 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -296,6 +296,22 @@ retina-shell-image:
TAG=$(RETINA_PLATFORM_TAG) \
CONTEXT_DIR=$(REPO_ROOT)

retina-shell-image-win:
for year in $(WINDOWS_YEARS); do \
tag=$(TAG)-windows-ltsc$$year-amd64; \
echo "Building retina-shell Windows image with tag $$tag"; \
set -e ; \
$(MAKE) container-$(CONTAINER_BUILDER) \
PLATFORM=windows/amd64 \
DOCKERFILE=shell/Dockerfile.windows \
REGISTRY=$(IMAGE_REGISTRY) \
IMAGE=$(RETINA_SHELL_IMAGE) \
OS_VERSION=ltsc$$year \
VERSION=$(TAG) \
TAG=$$tag \
CONTEXT_DIR=$(REPO_ROOT); \
done

kubectl-retina-image:
echo "Building for $(PLATFORM)"
set -e ; \
Expand Down
4 changes: 2 additions & 2 deletions cli/cmd/root.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,9 @@ type Config struct {
}

var Retina = &cobra.Command{
Use: "kubectl-retina",
Use: "kubectl-retina",
Short: "A kubectl plugin for Retina",
Long: "A kubectl plugin for Retina\nRetina is an eBPF distributed networking observability tool for Kubernetes.",
Long: "A kubectl plugin for Retina\nRetina is an eBPF distributed networking observability tool for Kubernetes.",
PersistentPreRun: func(*cobra.Command, []string) {
var config Config
file, _ := os.ReadFile(ClientConfigPath)
Expand Down
52 changes: 51 additions & 1 deletion cli/cmd/shell.go
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
package cmd

import (
"context"
"errors"
"fmt"
"os"
Expand All @@ -12,6 +13,7 @@ import (
v1 "k8s.io/api/core/v1"
"k8s.io/cli-runtime/pkg/genericclioptions"
"k8s.io/cli-runtime/pkg/resource"
"k8s.io/client-go/kubernetes"
cmdutil "k8s.io/kubectl/pkg/cmd/util"
"k8s.io/kubectl/pkg/scheme"
"k8s.io/kubectl/pkg/util/templates"
Expand All @@ -22,6 +24,7 @@ var (
matchVersionFlags *cmdutil.MatchVersionFlags
retinaShellImageRepo string
retinaShellImageVersion string
windowsImageTag string
mountHostFilesystem bool
allowHostFilesystemWrite bool
hostPID bool
Expand All @@ -38,6 +41,9 @@ var (

defaultTimeout = 30 * time.Second

// Default Windows image tag suffix
defaultWindowsImageTag = "windows-ltsc2022-amd64"

errMissingRequiredRetinaShellImageVersionArg = errors.New("missing required --retina-shell-image-version")
errUnsupportedResourceType = errors.New("unsupported resource type")
)
Expand All @@ -57,6 +63,10 @@ var shellCmd = &cobra.Command{
CLI flags (--retina-shell-image-repo and --retina-shell-image-version) or
environment variables (RETINA_SHELL_IMAGE_REPO and RETINA_SHELL_IMAGE_VERSION).
CLI flags take precedence over env vars.

For Windows nodes, the shell image will automatically use the Windows variant with the
specified Windows image tag suffix (--windows-image-tag). Windows support requires a
Windows node with HostProcess containers enabled.
`),

Example: templates.Examples(`
Expand All @@ -75,6 +85,9 @@ var shellCmd = &cobra.Command{
# start a shell in a node, with NET_RAW and NET_ADMIN capabilities
# (required for iptables and tcpdump)
kubectl retina shell node001 --capabilities NET_RAW,NET_ADMIN

# start a shell in a Windows node
kubectl retina shell win-node001
`),
Args: cobra.ExactArgs(1),
RunE: func(_ *cobra.Command, args []string) error {
Expand Down Expand Up @@ -106,6 +119,13 @@ var shellCmd = &cobra.Command{
return fmt.Errorf("error constructing REST config: %w", err)
}

// Create Kubernetes clientset to determine node OS
clientset, err := kubernetes.NewForConfig(restConfig)
if err != nil {
return fmt.Errorf("error creating clientset: %w", err)
}

// Create a generic config for now, we'll update image based on node OS
config := shell.Config{
RestConfig: restConfig,
RetinaShellImage: fmt.Sprintf("%s:%s", retinaShellImageRepo, retinaShellImageVersion),
Expand All @@ -123,10 +143,34 @@ var shellCmd = &cobra.Command{

switch obj := info.Object.(type) {
case *v1.Node:
podDebugNamespace := namespace
nodeName := obj.Name
podDebugNamespace := namespace

// Get the OS and update the image if it's Windows
nodeOS := obj.Labels["kubernetes.io/os"]
if nodeOS == "windows" {
// For Windows, use the Windows-specific image tag
windowsImageVersion := fmt.Sprintf("%s-%s", retinaShellImageVersion, windowsImageTag)
config.RetinaShellImage = fmt.Sprintf("%s:%s", retinaShellImageRepo, windowsImageVersion)
fmt.Printf("Using Windows shell image: %s\n", config.RetinaShellImage)
}

return shell.RunInNode(config, nodeName, podDebugNamespace)
case *v1.Pod:
// For pods, we need to get the node OS based on the pod's node
ctx := context.Background()
nodeOS, err := shell.GetNodeOS(ctx, clientset, obj.Spec.NodeName)
if err != nil {
return fmt.Errorf("error getting node OS: %w", err)
}

if nodeOS == "windows" {
// For Windows, use the Windows-specific image tag
windowsImageVersion := fmt.Sprintf("%s-%s", retinaShellImageVersion, windowsImageTag)
config.RetinaShellImage = fmt.Sprintf("%s:%s", retinaShellImageRepo, windowsImageVersion)
fmt.Printf("Using Windows shell image: %s\n", config.RetinaShellImage)
}

return shell.RunInPod(config, obj.Namespace, obj.Name)
default:
gvk := obj.GetObjectKind().GroupVersionKind()
Expand Down Expand Up @@ -154,9 +198,15 @@ func init() {
retinaShellImageVersion = envVersion
}
}
if !cmd.Flags().Changed("windows-image-tag") {
if envWindowsTag := os.Getenv("RETINA_SHELL_WINDOWS_IMAGE_TAG"); envWindowsTag != "" {
windowsImageTag = envWindowsTag
}
}
}
shellCmd.Flags().StringVar(&retinaShellImageRepo, "retina-shell-image-repo", defaultRetinaShellImageRepo, "The container registry repository for the image to use for the shell container")
shellCmd.Flags().StringVar(&retinaShellImageVersion, "retina-shell-image-version", defaultRetinaShellImageVersion, "The version (tag) of the image to use for the shell container")
shellCmd.Flags().StringVar(&windowsImageTag, "windows-image-tag", defaultWindowsImageTag, "The tag suffix to use for Windows shell images (e.g., 'windows-ltsc2022-amd64')")
shellCmd.Flags().BoolVarP(&mountHostFilesystem, "mount-host-filesystem", "m", false, "Mount the host filesystem to /host. Applies only to nodes, not pods.")
shellCmd.Flags().BoolVarP(&allowHostFilesystemWrite, "allow-host-filesystem-write", "w", false,
"Allow write access to the host filesystem. Implies --mount-host-filesystem. Applies only to nodes, not pods.")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ import (
func TestOverwriteDashboards(t *testing.T) {
// get all json's in various generation deploly folders
files, err := filepath.Glob("../../../grafana-dashboards/*.json")

if err != nil {
t.Fatal(err)
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@ import (
func TestDashboardsAreSimplified(t *testing.T) {
// get all json's in this folder
files, err := filepath.Glob("../../../grafana-dashboards/*.json")

if err != nil {
t.Fatal(err)
}
Expand Down
49 changes: 48 additions & 1 deletion docs/06-Troubleshooting/shell.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@

# To start a shell inside a pod (pod network namespace):
kubectl retina shell -n kube-system pods/coredns-d459997b4-7cpzx

# To start a shell in a Windows node:
kubectl retina shell win-node-001
```

Check connectivity using `ping`:
Expand Down Expand Up @@ -146,6 +149,43 @@

**If `systemctl` shows an error "Failed to connect to bus: No data available", check that the `retina shell` command has `--host-pid` set and that you have chroot'd to /host.**

## Windows Support

Retina shell supports Windows nodes by automatically detecting the node OS and using a Windows container image with appropriate networking tools.

### Windows Tools and Commands

When using a Windows node, you'll have access to these networking tools:

- `ipconfig`: Show network configuration

Check failure on line 160 in docs/06-Troubleshooting/shell.md

View workflow job for this annotation

GitHub Actions / markdownlint

Unordered list style [Expected: asterisk; Actual: dash]

docs/06-Troubleshooting/shell.md:160:1 MD004/ul-style Unordered list style [Expected: asterisk; Actual: dash]
- `netstat`: Show network connections

Check failure on line 161 in docs/06-Troubleshooting/shell.md

View workflow job for this annotation

GitHub Actions / markdownlint

Unordered list style [Expected: asterisk; Actual: dash]

docs/06-Troubleshooting/shell.md:161:1 MD004/ul-style Unordered list style [Expected: asterisk; Actual: dash]
- `ping`: Test connectivity

Check failure on line 162 in docs/06-Troubleshooting/shell.md

View workflow job for this annotation

GitHub Actions / markdownlint

Unordered list style [Expected: asterisk; Actual: dash]

docs/06-Troubleshooting/shell.md:162:1 MD004/ul-style Unordered list style [Expected: asterisk; Actual: dash]
- `tracert`: Trace route to destination

Check failure on line 163 in docs/06-Troubleshooting/shell.md

View workflow job for this annotation

GitHub Actions / markdownlint

Unordered list style [Expected: asterisk; Actual: dash]

docs/06-Troubleshooting/shell.md:163:1 MD004/ul-style Unordered list style [Expected: asterisk; Actual: dash]
- `nslookup`: DNS lookup

Check failure on line 164 in docs/06-Troubleshooting/shell.md

View workflow job for this annotation

GitHub Actions / markdownlint

Unordered list style [Expected: asterisk; Actual: dash]

docs/06-Troubleshooting/shell.md:164:1 MD004/ul-style Unordered list style [Expected: asterisk; Actual: dash]
- `route`: Show/manipulate routing table

Check failure on line 165 in docs/06-Troubleshooting/shell.md

View workflow job for this annotation

GitHub Actions / markdownlint

Unordered list style [Expected: asterisk; Actual: dash]

docs/06-Troubleshooting/shell.md:165:1 MD004/ul-style Unordered list style [Expected: asterisk; Actual: dash]
- `netsh`: Network shell for configuration

Check failure on line 166 in docs/06-Troubleshooting/shell.md

View workflow job for this annotation

GitHub Actions / markdownlint

Unordered list style [Expected: asterisk; Actual: dash]

docs/06-Troubleshooting/shell.md:166:1 MD004/ul-style Unordered list style [Expected: asterisk; Actual: dash]
- `nmap`: Network discovery and security auditing

Check failure on line 167 in docs/06-Troubleshooting/shell.md

View workflow job for this annotation

GitHub Actions / markdownlint

Unordered list style [Expected: asterisk; Actual: dash]

docs/06-Troubleshooting/shell.md:167:1 MD004/ul-style Unordered list style [Expected: asterisk; Actual: dash]
- `portqry`: Port scanner

Check failure on line 168 in docs/06-Troubleshooting/shell.md

View workflow job for this annotation

GitHub Actions / markdownlint

Unordered list style [Expected: asterisk; Actual: dash]

docs/06-Troubleshooting/shell.md:168:1 MD004/ul-style Unordered list style [Expected: asterisk; Actual: dash]
- `windump`: Packet analyzer (tcpdump for Windows)

Check failure on line 169 in docs/06-Troubleshooting/shell.md

View workflow job for this annotation

GitHub Actions / markdownlint

Unordered list style [Expected: asterisk; Actual: dash]

docs/06-Troubleshooting/shell.md:169:1 MD004/ul-style Unordered list style [Expected: asterisk; Actual: dash]

### Windows Example

```bash
# Start a shell in a Windows node
kubectl retina shell win-node-001

# You can specify a specific Windows image tag variant
kubectl retina shell win-node-001 --windows-image-tag windows-ltsc2019-amd64
```

### Windows Host Filesystem Access

For Windows nodes, the host filesystem is mounted at `C:\host` when using `--mount-host-filesystem`:

```bash
kubectl retina shell win-node-001 --mount-host-filesystem
```

## Troubleshooting

### Timeouts
Expand Down Expand Up @@ -176,10 +216,17 @@
kubectl retina shell node0001 # this will use the image "example.azurecr.io/retina/retina-shell:v0.0.1"
```

For Windows images, you can also override the Windows image tag suffix:

```bash
export RETINA_SHELL_WINDOWS_IMAGE_TAG="windows-ltsc2019-amd64"
kubectl retina shell win-node-001 # will use the Windows image with the specified tag suffix
```

## Limitations

* Windows nodes and pods are not yet supported.
* `bpftool` and `bpftrace` are not supported.
* The shell image links `iptables` commands to `iptables-nft`, even if the node itself links to `iptables-legacy`.
* `nsenter` is not supported.
* `ip netns` will not work without `chroot` to the host filesystem.
* On Windows, commands specific to Linux containers are not available (e.g., iptables, nft).
3 changes: 1 addition & 2 deletions operator/cilium-crds/k8s/fakeresource.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,7 @@ import (
"github.com/cilium/cilium/pkg/k8s/resource"
)

type fakeresource[T k8sRuntime.Object] struct {
}
type fakeresource[T k8sRuntime.Object] struct{}

func (f *fakeresource[T]) Events(ctx context.Context, opts ...resource.EventsOpt) <-chan resource.Event[T] {
return make(<-chan resource.Event[T])
Expand Down
4 changes: 1 addition & 3 deletions pkg/k8s/watcher_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,7 @@ func init() {
}
}

var (
logger = logging.DefaultLogger.WithField(logfields.LogSubsys, "k8s-watcher")
)
var logger = logging.DefaultLogger.WithField(logfields.LogSubsys, "k8s-watcher")

func Start(ctx context.Context, k *watchers.K8sWatcher) {
logger.Info("Starting Kubernetes watcher")
Expand Down
1 change: 0 additions & 1 deletion pkg/plugin/hnsstats/vfp_counters_windows.go
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,6 @@ func getVfpPortCountersRaw(portGUID string) (string, error) {

cmd := exec.Command("cmd", "/c", vfpCmd)
out, err := cmd.Output()

if err != nil {
return "", errors.Wrap(err, "errored while running vfpctrl /get-port-counter")
}
Expand Down
6 changes: 4 additions & 2 deletions pkg/plugin/linuxutil/types_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,10 @@ import (
"github.com/microsoft/retina/pkg/log"
)

const name = "linuxutil"
const defaultLimit = 2000
const (
name = "linuxutil"
defaultLimit = 2000
)

//go:generate go run go.uber.org/mock/[email protected] -source=types_linux.go -destination=linuxutil_mock_generated_linux.go -package=linuxutil
type linuxUtil struct {
Expand Down
1 change: 1 addition & 0 deletions pkg/telemetry/telemetry.go
Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,7 @@ func (t *TelemetryClient) heartbeat(ctx context.Context) {
maps.Copy(props, t.profile.GetMemoryUsage())
t.TrackEvent("heartbeat", props)
}

func metricsCardinality(gatherer prometheus.Gatherer) (int, error) {
if gatherer == nil {
return 0, fmt.Errorf("failed to get metrics Gatherer: %w", ErrorNilCombinedGatherer)
Expand Down
17 changes: 17 additions & 0 deletions shell/Dockerfile.windows
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
ARG WINDOWS_VERSION=ltsc2022

# Using a smaller Windows Server Core base image
FROM mcr.microsoft.com/windows/servercore:${WINDOWS_VERSION}

SHELL ["powershell", "-Command", "$ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue';"]

# PowerShell is already included in Windows Server Core
# No additional tools needed - using only built-in Windows commands

# Create a convenience script to display available tools
RUN Set-Content -Path "C:\show-tools.cmd" -Value "@echo off\necho.\necho Available Windows Commands:\necho - ipconfig : Show network configuration\necho - netstat : Show network connections\necho - ping : Test connectivity\necho - tracert : Trace route\necho - nslookup : DNS lookup\necho - route : Show/manipulate routing table\necho - netsh : Network shell for configuration\necho."

WORKDIR C:\

# Default entry point
CMD ["cmd.exe"]
8 changes: 5 additions & 3 deletions shell/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,14 @@
Retina CLI provides a command to launch an interactive shell in a node or pod for adhoc debugging.

* The CLI command `kubectl retina shell` creates a pod with `HostNetwork=true` (for node debugging) or an ephemeral container in an existing pod (for pod debugging).
* The container runs an image built from the Dockerfile in this directory. The image is based on Azure Linux and includes commonly-used networking tools.
* For Linux nodes, the container runs an image built from the Dockerfile in this directory, based on Azure Linux and includes commonly-used networking tools.
* For Windows nodes, the container runs a Windows-based image with Windows networking utilities built from Dockerfile.windows.

For testing, you can override the image used by `retina shell` either with CLI arguments
(`--retina-shell-image-repo` and `--retina-shell-image-version`) or environment variables
(`RETINA_SHELL_IMAGE_REPO` and `RETINA_SHELL_IMAGE_VERSION`).

Run `kubectl retina shell -h` for full documentation and examples.
For Windows nodes, you can specify the Windows image tag suffix with the `--windows-image-tag` flag or
the `RETINA_SHELL_WINDOWS_IMAGE_TAG` environment variable.

Currently only Linux is supported; Windows support will be added in the future.
Run `kubectl retina shell -h` for full documentation and examples.
Loading
Loading