Kubernetes v1.36 Enhances Job Management with Mutable Resource Requests

Apr 27, 2026 385 views

Kubernetes v1.36 takes a significant step in job management by promoting the ability to modify resource requests and limits in suspended Jobs to beta. Initially introduced as an alpha feature in v1.35, this update allows cluster administrators and queue controllers to customize CPU, memory, GPU, and extended resource specifications on a Job while it's suspended, providing flexibility before the Job commences or resumes execution.

Why Mutability in Pod Resources Matters

Workloads such as batch processing and machine learning often have fluctuating resource requirements that can't be accurately predicted at the time of Job creation. Optimal resource allocation can depend on factors like current cluster capacity, queue priorities, and the availability of specific hardware like GPUs.

Prior to this enhancement, once resource requirements were set in a Job's pod template, they became immutable, causing complications. For a queue controller, such as Kueue, if it deemed necessary to adjust the resources of a suspended Job, the only recourse was to delete and recreate it, resulting in lost metadata, status, and history. The new feature now enables gradual execution of a specific Job within a CronJob, even under heavy cluster load, instead of outright failure.

For example, consider a machine learning training Job that initially requests 4 GPUs:

apiVersion: batch/v1
kind: Job
metadata:
 name: training-job-example-abcd123
 labels:
 app.kubernetes.io/name: trainer
spec:
 suspend: true
 template:
 metadata:
 annotations:
 kubernetes.io/description: "ML training, ID abcd123"
 spec:
 containers:
 - name: trainer
 image: example-registry.example.com/training:2026-04-23T150405.678
 resources:
 requests:
 cpu: "8"
 memory: "32Gi"
 example-hardware-vendor.com/gpu: "4"
 limits: 
 cpu: "8"
 memory: "32Gi"
 example-hardware-vendor.com/gpu: "4"
 restartPolicy: Never

In a scenario where the controller identifies that only 2 GPUs are available, the new functionality allows for updating the Job's resource requests before it's resumed:

apiVersion: batch/v1
kind: Job
metadata:
 name: training-job-example-abcd123
 labels:
 app.kubernetes.io/name: trainer
spec:
 suspend: true
 template:
 metadata:
 annotations:
 kubernetes.io/description: "ML training, ID abcd123"
 spec:
 containers:
 - name: trainer
 image: example-registry.example.com/training:2026-04-23T150405.678
 resources:
 requests:
 cpu: "4"
 memory: "16Gi"
 example-hardware-vendor.com/gpu: "2"
 limits:
 cpu: "4"
 memory: "16Gi"
 example-hardware-vendor.com/gpu: "2"
 restartPolicy: Never

Once the resource updates are made, resuming the Job is as simple as setting spec.suspend to false, triggering the creation of new Pods with the revised specifications.

Mechanics of the Update

The Kubernetes API server has adjusted the immutability rules for pod template resource fields specifically for suspended Jobs. Rather than introducing any new API types, this feature leverages existing Job and pod template structures through relaxed validation rules.

Mutable fields now include:

  • spec.template.spec.containers[*].resources.requests
  • spec.template.spec.containers[*].resources.limits
  • spec.template.spec.initContainers[*].resources.requests
  • spec.template.spec.initContainers[*].resources.limits

Resource updates are allowed under two conditions:

  1. The Job must have spec.suspend set to true.
  2. If the Job was previously active before suspension, all running Pods must terminate (i.e., status.active must equal 0) before any modifications can be implemented.

Standard resource validation continues to apply; for instance, limits must not be less than requests, and extended resources should still be designated as whole numbers where applicable.

New Features with Beta Release

The promotion of the MutablePodResourcesForSuspendedJobs feature to beta in Kubernetes v1.36 means it is enabled by default. Clusters running this version can utilize the feature without needing additional API server configurations.

Testing the New Feature

If operating on Kubernetes v1.36 or later, this enhancement is readily available. For those using v1.35, activating the MutablePodResourcesForSuspendedJobs feature gate on the kube-apiserver will be necessary.

To experiment, create a suspended Job, adjust its container resources with kubectl edit or a controller, and then resume the Job:

# Create a suspended Job
kubectl apply -f my-job.yaml --server-side

# Edit the resource requests
kubectl edit job training-job-example-abcd123

# Resume the Job
kubectl patch job training-job-example-abcd123 -p '{"spec":{"suspend":false}}'

Things to Keep in Mind

Suspension of Running Jobs

When suspending an already active Job, ensure all active Pods have completed before modifying any resource settings. Resource mutations will be rejected by the API server while status.active remains greater than zero, to avoid discrepancies between running Pods and the updated pod template.

Setting Up Pod Replacement Policies

For Jobs that may see failed Pods, it’s advisable to configure podReplacementPolicy: Failed. This guarantee that replacement Pods are only created once all previous Pods have entirely terminated, reducing the risk of resource contention from overlapping Pods.

Handling Resource Claims

Note that dynamic resource allocation (DRA) templates for resourceClaimTemplates still maintain their immutable status. Workloads leveraging DRA must recreate these claim templates to align with any updated resource requirements.

Engaging with the Development

This new feature has been shaped by the input from SIG Apps and WG Batch. As it moves toward stabilization, both groups encourage and welcome feedback on the feature.

Community members can reach out via:

Source: James Davis · kubernetes.io

Comments

Sign in to comment.
No comments yet. Be the first to comment.

Related Articles

Kubernetes v1.36: Mutable Pod Resources for Suspended Job...