Skip to content

Export pod ephemeral PVCs metrics #2490

@TPXP

Description

@TPXP

What would you like to be added: kube-state-metrics exposes metrics about PVC usage by pods through metrics like kube_pod_spec_volumes_persistentvolumeclaims_info and kube_pod_spec_volumes_persistentvolumeclaims_readonly. I'd like similar metrics to be available for Ephemeral Volumes mounts since those are also backed by PVCs.

Why is this needed: We use prometheus metrics to determine whether a PVC is not mounted, giving us a reminder to drop it it was left behind for some reason. Our alerting rule lists PVCs in a namespace with kube_persistentvolumeclaim_info and excludes mounted ones with kube_pod_spec_volumes_persistentvolumeclaims_info. Ephemeral volumes generate a PVC which appears in kube_persistentvolumeclaim_info but not in kube_pod_spec_volumes_persistentvolumeclaims_info since the volume does not have PersistentVolumeClaim.ClaimName defined. Adding a metric exposing ephemeral PVCs would give us a way to avoid false alarms when a pod is using an ephemeral PVC.

Describe the solution you'd like: Exposing another metric kube_pod_spec_volumes_ephemeral_persistentvolumeclaims_info seems acceptable, or updating kube_pod_spec_volumes_persistentvolumeclaims_info to add a ephemeral label would work as well.

Implementation note: while the PodSpec does not have a field explicitly giving the PVC name, the docs clarify how it's derived from the pod and volume name:

Naming of the automatically created PVCs is deterministic: the name is a combination of the Pod name and volume name, with a hyphen (-) in the middle.

Alternatively, exposing PVC ownership data (ownerReferences metadata) would also address my use case, although I think it would be hard to integrate to my alerting rule.

Additional context
We sometimes run temporary workloads that need to store large amounts of data. Since we don't need the data to persist across pod executions, we use Ephemeral Volumes to ensure the PVC is removed when we drop the pod.

Here's a pod manifest example (we use these pods to perform operations on our databases by exec-ing into them, this avoids tunneling and guards against connection drops):

apiVersion: v1
kind: Pod
metadata:
  labels:
    run: tmp-workload
  name: tmp-workload
spec:
  terminationGracePeriodSeconds: 3
  containers:
  - args:
    - bash
    - -c
    - sleep infinity
    image: postgres
    name: tmp-workload
    volumeMounts:
    - name: workdir
      mountPath: /workdir
    resources:
      limits:
        memory: 1Gi
        cpu: "1"
  volumes:
  - name: workdir
    ephemeral:
      volumeClaimTemplate:
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 1Ti

Metadata

Metadata

Assignees

Labels

kind/featureCategorizes issue or PR as related to a new feature.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

Type

No type

Projects

Status

Backlog (stale)

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions