Enhancements in Kubernetes v1.36 Address Staleness and Improve Controller Observability

Apr 28, 2026 859 views

Kubernetes v1.36 brings significant upgrades aimed at tackling staleness in controllers, a silent disruptor that can lead to unexpected behavior in production systems. The challenges posed by staleness often only emerge when things go sideways, causing controllers to either act on outdated assumptions or refrain from taking necessary actions. With this new release, the Kubernetes team has rolled out features designed to enhance the accuracy of controller decision-making while providing better insight into their activities.

Understanding Staleness

Staleness arises from a controller's reliance on its local cache, which may fail to reflect the current state of the cluster due to several factors. Controllers maintain their caches by subscribing to changes from the Kubernetes API server relevant to their operations. However, when a controller is restarted or if the API server becomes unreachable, the information in its cache can become outdated.

This gap can cause a range of issues, from overwriting desired states with incorrect ones to failing to register important state changes entirely. Such problems emphasize the necessity for mechanisms that can ensure controllers act on the most accurate and up-to-date data.

Key Improvements in v1.36

The enhancements in version 1.36 focus on two main areas: updates to client-go and improvements within the kube-controller-manager. These improvements introduce features that allow for better handling of controller operations, especially in high-demand scenarios.

Client-go Enhancements

A notable advancement in client-go is the addition of atomic FIFO processing, encapsulated in the feature gate AtomicFIFO. This new system encourages atomic handling of batch operations, ensuring that queued events do not end up in a chaotic state that misrepresents the cluster's condition.

This feature means clients using client-go can now introspect their caches, checking the latest resource version seen by the controller's cache. The addition of the function LastStoreSyncResourceVersion() is instrumental in providing foundational elements for staleness mitigation within kube-controller-manager.

Kube-Controller-Manager Enhancements

The kube-controller-manager now supports staleness mitigation across several critical controllers: DaemonSet, StatefulSet, ReplicaSet, and Job. This functionality activates by default but can be deactivated if necessary by adjusting the feature gates related to each specific controller.

When enabled, controllers will first verify the latest resource version in their cache. If the cache's version lags behind what they have written to the API server, they’ll refrain from proceeding, preventing action based on outdated or incorrect state information.

Advice for Informer Authors

Authors creating informers with client-go can also harness the benefits introduced in v1.36. An example of implementation can be found in the ReplicaSet informer, demonstrating how to effectively determine cache freshness before acting.

The ConsistencyStore interface in client-go aids in tracking the latest resource versions associated with objects of interest. It offers three essential functions: WroteAt, which records when an object was written, EnsureReady, which checks if the cache is current, and Clear, which removes an object upon deletion.

Properly utilizing these tools allows informer authors to establish a reliable mechanism that mitigates staleness effectively.

Enhanced Observability

Alongside staleness mitigations, Kubernetes v1.36 introduced enhanced instrumentation within kube-controller-manager. These metrics, which are enabled by default, bolster the ability to monitor the health and performance of controllers.

New Metrics Introduced

This version includes a new alpha metric, stale_sync_skips_total, which tracks instances where a controller has skipped a sync due to staleness. This metric permits users to evaluate how often a controller acts on outdated data, which is vital for maintaining the overall health of the system.

Additionally, client-go now emits metrics that reflect the latest resource versions for every shared informer. This transparency aids in assessing the freshness of the controller's cache, especially in relation to the API server's resource version.

Future Developments

The Kubernetes SIG API Machinery is committed to further developing staleness mitigation for more controllers in upcoming releases. They invite community feedback to refine these enhancements, which can be shared in comments or via issue reports on GitHub.

Efforts are also underway to integrate these functionalities into controller-runtime, extending the benefits of consistency checks to all controllers developed within that framework, streamlining the process significantly.

Source: Robert Miller · kubernetes.io

Comments

Sign in to comment.
No comments yet. Be the first to comment.

Related Articles

Kubernetes v1.36: Staleness Mitigation and Observability ...