Skip to content

Excessive memory leak due to uncontrolled accumulation of health.log entries in Podman 5.x #25473

Closed
@gaurangomar

Description

@gaurangomar

Issue Description

When using healthchecks in Podman 5.x, we’ve observed that the internal health log grows continuously (into the thousands of entries) and never prunes older records, In our tests, the health.log field in the container’s inspect output eventually contains over 12,000 records, which keeps growing by time. This contrasts with Podman 4.x, which typically keeps only ~5 log entries. Furthermore, running top on the host shows unusually high memory usage by the /usr/bin/podman healthcheck process over time. These symptoms suggest a memory leak tied to Podman’s healthcheck mechanism in version 5.x.

Image

Steps to reproduce the issue

Steps to Reproduce:

  • Healthcheck Configuration:
    Use a healthcheck configuration identical to the one that worked in Podman 4.x. For example:
"Healthcheck": {
    "Test": [
        "CMD",
        "curl",
        "-f",
        "http://agent:8080/health"
    ],
    "Interval": 30000000000,
    "Timeout": 10000000000,
    "Retries": 5
}
  • Run the Container:
    Start a container with this configuration on Podman 5.x.

  • Monitor Health Log:
    After the container runs for a while, run podman inspect and check the State.Health.Log field. In Podman 5.x, it continuously accumulates records (e.g., over 12,000 entries) rather than being capped (as observed in Podman 4.x, which only shows about 5 entries).

  • Observe Memory Usage:
    Use monitoring tools (e.g., top) to observe the memory usage. There is a significant and continuous increase in memory consumption, particularly in kernel memory (kmalloc-2k and kmalloc-4k slabs).

This is high usage in top command for healthcheck is randomly visible and we are running 8 containers.

Describe the results you received

When using healthchecks in Podman 5.x, we’ve observed that the internal health log continuously grows instead of being capped at a few entries (as seen in Podman 4.x). In our tests, the health.log field in the container’s inspect output eventually contains over 12,000 records compared to the expected ~5 entries in version 4.x. This uncontrolled log growth correlates with a continuous increase in memory usage.

Describe the results you expected

the mem usages should not increace, and it should have limited logs

podman info output

host:
  arch: amd64
  buildahVersion: 1.37.5
  cgroupControllers:
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.12-1.el9.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.12, commit: b3f4044f63d830049366c05304a1d5d558571e85'
  cpuUtilization:
    idlePercent: 76.81
    systemPercent: 6.73
    userPercent: 16.46
  cpus: 2
  databaseBackend: sqlite
  distribution:
    distribution: ol
    variant: server
    version: "9.5"
  eventLogger: file
  freeLocks: 2026
  hostname: k-jambunatha-tf64-ecp-edge-multi-int-openstack-perf-1771036--ed
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 2001
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 2002
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.15.0-304.171.4.1.el9uek.x86_64
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 809750528
  memTotal: 3803951104
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.12.2-1.el9_5.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.12.2
    package: netavark-1.12.2-1.el9.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.12.2
  ociRuntime:
    name: crun
    package: crun-1.16.1-1.el9.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.16.1
      commit: afa829ca0122bd5e1d67f1f38e6cc348027e3c32
      rundir: /run/user/2002/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20240806.gee36266-2.el9.x86_64
    version: |
      pasta 0^20240806.gee36266-2.el9.x86_64
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: /run/user/2002/podman/podman.sock
  rootlessNetworkCmd: pasta
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.3.1-1.el9.x86_64
    version: |-
      slirp4netns version 1.3.1
      commit: e5e368c4f5db6ae75c2fce786e31eef9da6bf236
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 2469085184
  swapTotal: 4194299904
  uptime: 312h 40m 36.00s (Approximately 13.00 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - container-registry.oracle.com
store:
  configFile: /home/user/.config/containers/storage.conf
  containerStore:
    number: 10
    paused: 0
    running: 10
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/user/.local/share/containers/storage
  graphRootAllocated: 40961572864
  graphRootUsed: 2026479616
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 10
  runRoot: /run/user/2002/containers
  transientStore: false
  volumePath: /home/user/.local/share/containers/storage/volumes
version:
  APIVersion: 5.2.2
  Built: 1735903242
  BuiltTime: Fri Jan  3 06:20:42 2025
  GitCommit: ""
  GoVersion: go1.22.9 (Red Hat 1.22.9-2.el9_5)
  Os: linux
  OsArch: linux/amd64
  Version: 5.2.2

Podman in a container

No

Privileged Or Rootless

None

Upstream Latest Release

No

Additional environment details

podman --version

podman version 5.2.2

Additional information

Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting

Metadata

Metadata

Assignees

Labels

jirakind/bugCategorizes issue or PR as related to a bug.locked - please file new issue/PRAssist humans wanting to comment on an old issue or PR with locked comments.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions