Kubernetes v1.36 Enhances Job Management with Mutable Resource Requests
Kubernetes v1.36 takes a significant step in job management by promoting the ability to modify resource requests and limits in suspended Jobs to beta. Initially introduced as an alpha feature in v1.35, this update allows cluster administrators and queue controllers to customize CPU, memory, GPU, and extended resource specifications on a Job while it's suspended, providing flexibility before the Job commences or resumes execution.
Why Mutability in Pod Resources Matters
Workloads such as batch processing and machine learning often have fluctuating resource requirements that can't be accurately predicted at the time of Job creation. Optimal resource allocation can depend on factors like current cluster capacity, queue priorities, and the availability of specific hardware like GPUs.
Prior to this enhancement, once resource requirements were set in a Job's pod template, they became immutable, causing complications. For a queue controller, such as Kueue, if it deemed necessary to adjust the resources of a suspended Job, the only recourse was to delete and recreate it, resulting in lost metadata, status, and history. The new feature now enables gradual execution of a specific Job within a CronJob, even under heavy cluster load, instead of outright failure.
For example, consider a machine learning training Job that initially requests 4 GPUs:
apiVersion: batch/v1
kind: Job
metadata:
name: training-job-example-abcd123
labels:
app.kubernetes.io/name: trainer
spec:
suspend: true
template:
metadata:
annotations:
kubernetes.io/description: "ML training, ID abcd123"
spec:
containers:
- name: trainer
image: example-registry.example.com/training:2026-04-23T150405.678
resources:
requests:
cpu: "8"
memory: "32Gi"
example-hardware-vendor.com/gpu: "4"
limits:
cpu: "8"
memory: "32Gi"
example-hardware-vendor.com/gpu: "4"
restartPolicy: Never
In a scenario where the controller identifies that only 2 GPUs are available, the new functionality allows for updating the Job's resource requests before it's resumed:
apiVersion: batch/v1
kind: Job
metadata:
name: training-job-example-abcd123
labels:
app.kubernetes.io/name: trainer
spec:
suspend: true
template:
metadata:
annotations:
kubernetes.io/description: "ML training, ID abcd123"
spec:
containers:
- name: trainer
image: example-registry.example.com/training:2026-04-23T150405.678
resources:
requests:
cpu: "4"
memory: "16Gi"
example-hardware-vendor.com/gpu: "2"
limits:
cpu: "4"
memory: "16Gi"
example-hardware-vendor.com/gpu: "2"
restartPolicy: Never
Once the resource updates are made, resuming the Job is as simple as setting spec.suspend to false, triggering the creation of new Pods with the revised specifications.
Mechanics of the Update
The Kubernetes API server has adjusted the immutability rules for pod template resource fields specifically for suspended Jobs. Rather than introducing any new API types, this feature leverages existing Job and pod template structures through relaxed validation rules.
Mutable fields now include:
spec.template.spec.containers[*].resources.requestsspec.template.spec.containers[*].resources.limitsspec.template.spec.initContainers[*].resources.requestsspec.template.spec.initContainers[*].resources.limits
Resource updates are allowed under two conditions:
- The Job must have
spec.suspendset totrue. - If the Job was previously active before suspension, all running Pods must terminate (i.e.,
status.activemust equal 0) before any modifications can be implemented.
Standard resource validation continues to apply; for instance, limits must not be less than requests, and extended resources should still be designated as whole numbers where applicable.
New Features with Beta Release
The promotion of the MutablePodResourcesForSuspendedJobs feature to beta in Kubernetes v1.36 means it is enabled by default. Clusters running this version can utilize the feature without needing additional API server configurations.
Testing the New Feature
If operating on Kubernetes v1.36 or later, this enhancement is readily available. For those using v1.35, activating the MutablePodResourcesForSuspendedJobs feature gate on the kube-apiserver will be necessary.
To experiment, create a suspended Job, adjust its container resources with kubectl edit or a controller, and then resume the Job:
# Create a suspended Job
kubectl apply -f my-job.yaml --server-side
# Edit the resource requests
kubectl edit job training-job-example-abcd123
# Resume the Job
kubectl patch job training-job-example-abcd123 -p '{"spec":{"suspend":false}}'
Things to Keep in Mind
Suspension of Running Jobs
When suspending an already active Job, ensure all active Pods have completed before modifying any resource settings. Resource mutations will be rejected by the API server while status.active remains greater than zero, to avoid discrepancies between running Pods and the updated pod template.
Setting Up Pod Replacement Policies
For Jobs that may see failed Pods, it’s advisable to configure podReplacementPolicy: Failed. This guarantee that replacement Pods are only created once all previous Pods have entirely terminated, reducing the risk of resource contention from overlapping Pods.
Handling Resource Claims
Note that dynamic resource allocation (DRA) templates for resourceClaimTemplates still maintain their immutable status. Workloads leveraging DRA must recreate these claim templates to align with any updated resource requirements.
Engaging with the Development
This new feature has been shaped by the input from SIG Apps and WG Batch. As it moves toward stabilization, both groups encourage and welcome feedback on the feature.
Community members can reach out via: