Blog

Kubernetes 1.33 – What you need to know

Apr 8 2025/2025/21 min read
Picture of Nigel Douglas
by Nigel Douglas

Kubernetes 1.33 is right around the corner, and there are quite a lot of changes to unpack! Removing enhancements with the status of “Deferred” or “Removed from Milestone” we have 64 Enhancements in all listed within the official tracker. So, what’s new in 1.33?

Kubernetes 1.33 brings a whole bunch of useful enhancements, including 35 changes tracked as ‘Graduating’ in this Kubernetes release. From these, 17 enhancements are graduating to stable, including the ability to specify a new --subresource flag in kubectl commands like get, patch, edit and replace commands, which provides a new way to allow fetching and updating of subresources like status, scale, and more.

26 new alpha features are also listed in the enhancements tracker, with a bit of conversation at Kubecon Europe around the ability to implement declarative validation of Kubernetes native types using validation-gen to generate validation code. Amongst other benefits, DevOps teams will benefit from streaming encoding specifically for LIST response. This enhancement significantly reduces memory consumption on the API Server when serving large LIST responses.

The Cloudsmith team is really excited about this release and everything that comes with it! Let’s jump into all of the major features and changes in Kubernetes 1.33.

Kubernetes 1.33 – Editor’s pick:

Here are a few of the changes that Cloudsmith employees are most exciting for in this release:

#5080 Ordered Namespace Deletion

The current approach to deleting resources within a Kubernetes namespace is somewhat flawed, especially in environments that demand strict security controls. Relying on a semi-random deletion order introduces unnecessary risk. One particularly dangerous scenario is when a Pod outlives its associated NetworkPolicy, effectively stripping it of its intended network protections and leaving it wide open to unrestricted access. This creates a serious risk. 

To address this, the Kubernetes maintainers proposed a deliberate and opinionated deletion strategy that prioritises security by enforcing a logical order: Pods should be deleted before their dependent resources, like NetworkPolicies. This isn’t just about order, it’s about responsibility. By acknowledging some real-world implications of resource dependencies and acting accordingly, you can dramatically reduce the chance of exposing workloads during namespace teardown.

Nigel Douglas – Head of Developer Relations


#127 Support User Namespaces in pods

What excites me most is the long-anticipated improvement around support for user-specific namespaces. This feature would allow our teams to better isolate security-sensitive elements such as user and group IDs, root directories, capabilities, and encryption keys. User namespaces offer a way to differentiate identity and permissions inside a container from those on the host system. We’re looking to avoid scenarios where a process could operate with full privileges (user ID 0) within its namespace while remaining an unprivileged user outside of it.

Bringing user namespace support to Kubernetes would enable pods to run with distinct user and group IDs compared to the host. This means a process that appears privileged inside the container is effectively limited in scope if it escapes to the host, reducing our risk of a system-wide compromise. It's a powerful security boundary that can help contain potential breaches and enforce least privilege at runtime.

Ian Duffy – Site Reliability Engineer


#1287 In-Place Update of Pod Resources

I'm especially pleased to see the introduction of in-place updates for Pod resource requests and limits, eliminating the need to restart Pods or their Containers. Adjusting resource allocations can become necessary for a number of reasons, whether due to a spike in workload, under-utilisation of assigned resources, or simply because the initial settings were misconfigured.

Until now, modifying a Pod's resource configuration required a full restart, as the resource fields in the PodSpec are immutable. While this may be manageable for resilient, stateless applications, it poses a challenge for workloads that are stateful, batch-oriented, or run with minimal replicas. In such cases, restarting Pods can lead to unnecessary downtime or increased operational costs. The ability to adjust resources dynamically, without disruption, is a valuable improvement that directly improves availability.

It’s important to note, however, that while these resource values can now be updated in place, the Pod’s Quality of Service (QoS) class remains immutable. This is a key consideration because the QoS classes (Guaranteed, Burstable, or BestEffort) - is determined at Pod creation based on the presence and nature of resource requests and limits. Since the QoS class influences how Kubernetes schedules and evicts Pods under resource pressure, it’s crucial to set these values thoughtfully at the outset.

For example, you might increase a Pod’s CPU and memory limits significantly using an in-place update, but if the Pod was initially classified as BestEffort or Burstable, Kubernetes may still deem it a lower priority and evict it during contention, even though it's now consuming a large share of resources. This kind of mismatch can lead to surprising and frustrating behaviour in production, which is why understanding the fixed nature of QoS classes is just as important as the flexibility offered by in-place updates.

Ralph McTeggart – Principal Engineer



Apps in Kubernetes 1.33


#3850 Backoff Limit Per Indexed Jobs
Stage: Graduating to Stable
Feature group: sig-apps

In the current Kubernetes implementation, all indexes within an indexed job share a single backoff limit. This means if the overall job hits that limit, the entire job is marked as failed, even if some indexes haven't finished running. This model doesn't align well with embarrassingly parallel workloads, where each index operates independently. For example, using indexed jobs for a suite of long-running integration tests would halt all tests after a single failure, limiting visibility into other potential issues. In contrast, systems like AWS Batch already treat each index separately, highlighting the need for a more flexible approach in Kubernetes.

To address this, the proposal introduces a per-index backoff policy, allowing each index to fail or succeed independently. This would let jobs continue running even if some indexes fail. It also suggests a new API field to control the number of allowed failed indexes, and a new FailIndex action within the PodFailurePolicy to proactively stop retries for specific indexes before hitting the backoff limit.

#3998 Job success/completion policy
Stage: Graduating to Stable
Feature group: sig-apps

Some batch workloads, such as those using MPI or PyTorch, only rely on specific "leader" indexes to determine whether a job has succeeded. However, the current Kubernetes behaviour requires all indexes in an indexed job to succeed before the job is marked as complete, which limits flexibility for these use cases. To address this, a proposed extension to the Job API introduces a configurable success policy, allowing users to define custom conditions under which an indexed job can be considered successful.

With this enhancement, a job can be marked as succeeded based on specific criteria - for example, if certain indexes [0, 1, 2] complete successfully, or if a minimum number of indexes succeed. Once the job meets the defined successPolicy, any remaining pods are terminated, avoiding unnecessary resource consumption. This proposal does not alter the behaviour of jobs that don’t use a successPolicy, nor does it apply to non-indexed jobs at this time, as those scenarios currently lack strong use cases. A new condition, SuccessCriteriaMet, will indicate when a job satisfies its defined success policy.

#3973 Consider Terminating Pods in Deployments
Stage: Net New to Alpha
Feature group: sig-apps

Deployments in Kubernetes currently show inconsistent behaviour when dealing with terminating pods, depending on the rollout strategy or scaling activity. In some cases, it might be more efficient to wait for pods to fully terminate before creating new ones, while in others, launching new pods immediately is preferable. To address this inconsistency, a new field, .spec.podReplacementPolicy, is proposed to give users control over when replacement pods should be created during deployment updates.

The goal is to let users define whether a Deployment should wait for pods to terminate before spinning up new ones or proceed without delay, all while respecting the chosen deployment strategy. Additionally, the status of Deployments and ReplicaSets would be enhanced to reflect the number of managed terminating pods. This distinction is important, as pods marked for deletion (with a deletionTimestamp) aren't currently accounted for in the .status.replicas field. This can lead to temporary over-provisioning - especially during rollouts or unexpected deletions like evictions - which can cause resource strain, potential scheduling issues, and even unnecessary autoscaling, which also may increase cloud costs in large-scale or tightly resourced environments.

API in Kubernetes 1.33


#4008 CRD Validation Ratcheting
Stage: Graduating to Stable
Feature group: sig-api-machinery

A key long-term goal for improving Kubernetes usability is to shift validation logic from controllers to the front-end. Currently, the handling of validation for unchanged fields presents a barrier for both Custom Resource Definition (CRD) authors and Kubernetes developers, preventing broader adoption of validation features. CRD authors face difficulties when modifying value validations, as they must increment the version even for minor changes, which can be cumbersome and disrupt the user workflow.

This proposal aims to eliminate barriers that hinder CRD authors and Kubernetes from adjusting value validations. By removing these obstacles, CRD authors can more easily widen or tighten value validations without fear of breaking workflows. The goal is to automate this process for all CRDs installed into clusters with the feature enabled, ensuring minimal overhead-negligible persistent overhead and up to a 5% time overhead for resource writes. The approach also emphasises correctness by ensuring that invalid values, which would fail a known schema, are not allowed.

#4355 Coordinated Leader Election
Stage: Graduating to Beta
Feature group: sig-api-machinery

This proposal introduces a more secure leader election mechanism for component upgrades and rollbacks, primarily leveraging leases with two significant changes. First, instead of competing for the lease, component instances declare their candidacy, and an election coordinator selects the best candidate by choosing the one with the lowest version, ensuring skew rules are upheld. Second, the election coordinator can mark a lease as "end of term" which prompts the current leader to stop renewing the lease and allows the coordinator to replace it with a better-suited leader. This approach addresses the issues that arise during node-by-node upgrades and rollbacks in systems like Kubernetes, Cluster API, kubeadm, and KIND, which often lead to skew violations due to timing inconsistencies in component upgrades.

apiVersion: coordination.k8s.io/v1
kind: LeaseCandidate
metadata:
  name: some-custom-controller-0001A
  namespace: kube-system
spec:
  leaseName: some-custom-controller
  binary-version: "1.33"
  compatibility-version: "1.33"
  leaseDurationSeconds: 300
  renewTime: "2025-04-10T02:33:08.685777Z"

The proposed mechanism aims to resolve challenges such as the potential for skew violations when a new version of a controller runs while old versions of the API server are still active. It also prevents lease loss during upgrades or rollbacks, which can cause component version flip-flopping. The goal is to enable more predictable version changes, avoid skew violations, and ensure a smoother upgrade process while allowing for canary deployments or pauses during upgrades. This enhancement, when combined with other tools like UVIP, improves the ability to manage control plane components during upgrades, rollbacks, and downgrades. The proposed solution offers an opt-in mechanism to elect the oldest version candidate and preempt the current leader, reusing the existing lease structure as much as possible.

#5080 Ordered namespace deletion
Stage: Net New to Alpha
Feature group: sig-api-machinery

This proposal introduces an opinionated deletion process for Kubernetes namespaces to ensure secure and predictable removal of resources. Currently, the deletion order is semi-random, which can result in security gaps, such as Pods remaining active after their associated NetworkPolicies have been deleted. By implementing a more structured deletion order, Pods will be deleted before other resources based on logical and security dependencies, improving both security and reliability. This change addresses risks caused by non-deterministic deletion and ensures the safe termination of resources in a more controlled manner.

The current random deletion process poses significant challenges, especially in security-sensitive environments. For example, deleting a NetworkPolicy before its associated Pod can expose the Pod to unrestricted network access, creating a security vulnerability. The proposed opinionated deletion process aims to enhance security by ensuring that resources like NetworkPolicies remain active until all dependent resources are properly terminated. Additionally, it provides a more predictable and consistent cleanup process, reducing operational disruptions. The proposal also maintains compatibility with existing Kubernetes workflows and APIs while ensuring safe and reliable resource cleanup.

CLI in Kubernetes 1.33


#2590 Add subresource support to kubectl
Stage: Graduating to Stable
Feature group: sig-cli

Currently, working with sub-resources such as status in Kubernetes through kubectl is a cumbersome process. Users must rely on kubectl --raw to fetch sub-resources, and updating them (like patching the status or scale) requires resorting to direct curl calls against the API server. This workflow is far from intuitive and creates unnecessary friction for developers and operators when testing, debugging, or interacting with the Kubernetes API.

To improve this experience, a proposed enhancement introduces a new --subresource flag to key kubectl commands such as get, patch, edit, apply, and replace. This change makes subresources like status, scale, and resize accessible and manageable across all supported resource types, including custom resources. When using the new flag, subresource outputs will be formatted consistently with the main resource view, ensuring clarity and usability. It's important to note that the subresource API behaves the same as the full resource, so changes, like updating the status, may still be subject to reconciliation by the corresponding controller.

#3104 Implement a .kuberc file to separate user preferences from cluster configs
Stage: Net New to Alpha
Feature group: sig-cli

During the Alpha phase, this feature will be enabled through the KUBECTL_KUBERC=true environment variable, with the default kuberc file located in ~/.kube/kuberc. Users can override this location using the --kuberc flag (kubectl --kuberc /var/kube/rc). The kuberc file will serve as an optional configuration to separate user preferences from cluster credentials and server configurations. This proposal aims to introduce a new file dedicated to user preferences, which will offer better flexibility compared to kubeconfig, which currently under-utilises its preferences field due to the creation of new files for each cluster that mix credentials with preferences.

The kuberc file will provide a clear separation between server configurations and user preferences, allowing users to define command aliases, default flags, and other customisations. The file will be versioned to support easy future updates, and users will be able to maintain a single preferences file, regardless of the --kubeconfig flag or $KUBECONFIG environment variable. This proposal also suggests deprecating the kubeconfig preferences field to streamline configuration management. The kuberc file will be entirely optional, providing users with an opt-in way for custom kubectl behaviour without impacting any existing setups.

Kubernetes 1.33 Networking


#1880 Multiple Service CIDRs
Stage: Graduating to Stable
Feature group: sig-network
Feature gate: MultiCIDRServiceAllocator Default value: false

This enhancement introduces the ability for users to dynamically expand the pool of IP addresses available for Kubernetes Services. Services, particularly those of type ClusterIP, NodePort, and LoadBalancer, rely on cluster-wide virtual IPs (ClusterIPs), which must be unique throughout the cluster. Currently, if a user tries to assign a ClusterIP that's already in use, the operation fails. The existing IP allocation mechanism has notable limitations - most critically, the inability to resize or extend the range of IPs. This creates challenges when IP exhaustion occurs or when network ranges overlap in your cluster.

To resolve these issues, a new allocation mechanism is being proposed that is scalable, tunable, and fully backward compatible. This system introduces two new API resources: ServiceCIDR and IPAddress. By enabling the creation of additional ServiceCIDR objects, users can effectively increase the available IP space for Services on the fly. The updated allocator intelligently draws from any defined ServiceCIDR, functioning similarly to how adding more disks expands capacity in a storage system.

#4444 Traffic Distribution for Services
Stage: Graduating to Stable
Feature group: sig-network

This enhancement proposes adding a new trafficDistribution field to the Kubernetes Service spec, intended to replace the now-deprecated topologyKeys field and the service.kubernetes.io/topology-mode annotation. These earlier mechanisms aimed to influence service routing based on topology but had notable limitations. topologyKeys, for example, offered rigid, user-defined routing preferences without allowing for dynamic or feedback-based optimisations, and its flexibility often made implementation difficult or impractical.

Subsequent attempts to improve on this, like TopologyAwareRouting and internalTrafficPolicy, introduced more intelligent and flexible routing via implementation-specific heuristics and node-local traffic control. However, they lacked predictability and granular user control, especially for scenarios like preferring node-local traffic with fallback to zone or region. The new trafficDistribution field addresses these gaps by aiming to balance user intent with implementation flexibility. It introduces a standardised way for users to influence traffic routing behaviour while giving implementers the freedom to support advanced or evolving strategies, such as routing based on topology, latency, or custom heuristics.

#2433 Topology Aware Routing
Stage: Graduating to Stable
Feature group: sig-network

As of February 2025, the scope of this Kubernetes Enhancement Proposal (KEP) has been significantly reduced. Initially, it aimed to graduate both the hints field in EndpointSlice and a topology-aware routing mechanism using the service.kubernetes.io/topology-mode=Auto annotation. However, only the hints field is moving to General Availability (GA) in this release. While the routing-related features and annotations remain out of scope for GA, they are still documented in the #2433 KEP for historical context and to clarify the rationale behind the hints implementation. Portions of the original proposal, particularly those related to production readiness, remain relevant as other components may graduate separately under different KEPs. (see: #4444 and #3015)

The core motivation for the hints mechanism stems from the growing deployment of Kubernetes clusters across multiple zones. In such environments, traffic is often distributed randomly among service endpoints, leading to unnecessary cross-zone communication. This proposal introduces a lightweight, automatic method for topology-aware routing by allowing EndpointSlice producers to suggest which endpoints consumers (like kube-proxy) should use. The goal is to favour in-zone traffic when possible, offering benefits such as reduced latency, improved performance, and lower cross-zone data transfer costs. The design emphasises simplicity, relying on existing topology labels and requiring minimal user configuration to influence traffic locality.

Kubernetes 1.33 Authentication


#3257 Define ClusterTrustBundle, a resource for holding X.509 trust anchors
Stage: Graduating to Beta
Feature group: sig-auth

Kubernetes currently allows workloads to request certificates via the certificates.k8s.io API, but lacks a standardised way for signers to distribute their trust anchors to workloads. Trust anchors, which are typically root or intermediate X.509 certificates, are essential for validating certificate chains, and are often context-specific. To address this gap, this KEP introduces a new cluster-scoped resource called ClusterTrustBundle, a specialised object for storing and sharing trust anchors. These bundles are optionally linked to a specific certificate signer using .spec.signerName, with RBAC controls governing their mutation. To make these trust bundles consumable by workloads, a new Kubelet clusterTrustBundle projected volume source allows pods to mount trust anchor sets as files that automatically update as the bundle changes. This new system is designed to eventually replace the legacy kube-root-ca.crt ConfigMaps present in all namespaces.

A practical example of this is distributing trust anchors from a private CA to workloads. Suppose a custom signer example.com/server-tls issues certificates to a server pod. A client pod can consume the related trust anchors by mounting a ClusterTrustBundle volume like this:

apiVersion: v1alpha1
kind: ClusterTrustBundle
metadata:
  name: example.com:server-tls:foo
  labels:
    example.com/cluster-trust-bundle-version: live
spec:
  signerName: example.com/server-tls
  trustBundle: "<... PEM DATA ...>"

The client pod references this bundle in its volume configuration:

apiVersion: v1
kind: Pod
metadata:
  namespace: client
  name: client
spec:
  containers:
  - name: main
    image: my-image
    volumeMounts:
    - mountPath: /var/run/example-com-server-tls-trust-anchors
      name: example-com-server-tls-trust-anchors
      readOnly: true
  volumes:
  - name: example-com-server-tls-trust-anchors
    projected:
      sources:
      - clusterTrustBundle:
          signerName: example.com/server-tls
          labelSelector:
            example.com/cluster-trust-bundle-version: live
          path: ca_certificates.pem

Kubelet merges all relevant trust bundles into ca_certificates.pem, ensuring the client always has up-to-date trust anchors (even through rotations) enabling seamless and secure communication with the server.

#4193 Bound service account token improvements
Stage: Graduating to Stable
Feature group: sig-auth

Token projection and alternative audiences on JWTs issued by the Kubernetes apiserver enable external systems to verify the identity and attributes (like an associated ServiceAccount or Pod) of a requesting entity. However, it's currently not possible to confirm a Pod's association with a specific Node solely from the token without fetching the Pod object and cross-referencing its spec.nodeName. To enhance the integrity of identity verification, especially in cases aiming to prevent token replay attacks, embedding the identity of the Node into the JWT would allow external systems to validate that the Pod and its projected token originate from the same trusted Node.

This enhancement involves adding the Node reference as a private claim in JWTs issued by the TokenRequest API, similar to how ServiceAccounts, Pods, and Secrets are currently handled. It also introduces support for directly binding tokens to Node objects, enabling more precise verification and improved auditability. Additionally, unique identifiers for tokens will be embedded to allow traceability in audit logs, linking token usage to its origin. These improvements aim to strengthen the trust chain from the original requester through to the JWT, making token validation more secure and verifiable across distributed systems.

#740 API for external signing of Service Account tokens
Stage: Net New to Alpha
Feature group: sig-auth

Kubernetes currently relies on service account keys loaded from disk at kube-apiserver startup for JWT signing and authentication. This static method limits key rotation flexibility and poses security risks if signing materials are exposed via disk access. To address these limitations, a proposal suggests enabling integration with external key management systems (such as HSMs and cloud KMSes). This would allow service account JWT signing to occur out-of-process without restarting the kube-apiserver, improving both ease of rotation and security.

The proposal introduces a new gRPC API (ExternalJWTSigner) to support signing and key management externally, similar to Kubernetes’ existing KMS API model. It maintains backward compatibility by preserving current file-based behavior unless a new signing endpoint is explicitly configured. The key goals for this update include supporting public key listing, ensuring token compatibility with existing standards, and avoiding performance regression for in-process keys.

Kubernetes 1.33 Nodes


#3857 Recursive Read-only (RRO) mounts
Stage: Graduating to Stable
Feature group: sig-network

To enhance security and consistency on your nodes, this proposal aims to ensure that read-only volumes in Kubernetes are recursively read-only. Currently, mounting a directory like /mnt as read-only does not enforce the same restriction on its submounts (for example /mnt/usbstorage), potentially exposing data to unintended modifications. By leveraging the OCI Runtime's rro (recursive read-only) bind mount option, this limitation can be addressed. This option uses mount_setattr(2) with the MOUNT_ATTR_RDONLY and AT_RECURSIVE flags to enforce recursive immutability. However, this requires Linux kernel version 5.12 or later, along with compatible OCI runtimes such as runc >= 1.1 or crun >= 1.4.

The proposed change introduces a new recursiveReadOnly field to the VolumeMount struct in the pod spec, enabling fine-grained control with options like Disabled, IfPossible, or Enabled. This setting allows users to request recursive read-only behaviour while maintaining compatibility with environments that may not support it.

Here’s how it would look in a pod manifest:

spec:
  volumes:
    - name: foo
      hostPath:
        path: /mnt
        type: Directory
  containers:
  - volumeMounts:
    - mountPath: /mnt
      name: foo
      mountPropagation: None
      readOnly: true
      recursiveReadOnly: IfPossible

While this feature improves isolation, it introduces a broader API surface and may create a false sense of security if not properly implemented. To address this, it is suggested that Kubernetes reflect the actual recursive read-only status through VolumeMountStatus.

#2625 Add options to reject non SMT-aligned workloads
Stage: Graduating to Stable
Feature group: sig-network

This proposal introduces enhancements to Kubernetes’ CPUManager aimed at improving performance predictability for latency-sensitive workloads on systems with Simultaneous Multithreading (SMT) enabled. While the existing static policy in CPUManager already prevents virtual CPU sharing, it does not ensure exclusive use of entire physical cores. This can lead to performance degradation due to shared resources like caches when threads from different workloads are scheduled on the same core.

To address this, a new Kubelet configuration option, cpumanager-policy-options, is proposed. Within this, a new flag full-pcpus-only can be set to ensure that workloads are assigned only full physical CPUs, avoiding partial core allocations. This approach strengthens workload isolation by preventing thread-level resource sharing and improving cache efficiency. The proposal emphasises tighter control over CPU allocation to better support the requirements of latency-sensitive applications.

#753 Built-in support for sidecar containers pattern
Stage: Graduating to Stable
Feature group: sig-network

The sidecar container pattern has been part of Kubernetes since its early days, with a 2015 blog post clearly describing the concept. Over time, sidecars have become increasingly common and diverse in use, though current Kubernetes features don't always support them cleanly, often requiring awkward workarounds. To address this, a proposal has been introduced to enhance init containers with a restartPolicy field, specifically allowing restartPolicy: Always to designate them as sidecars. Unlike typical init containers, these would not block pod startup but instead run alongside other containers after their startup is confirmed via a startup probe or the completion of a postStart lifecycle hook.

Sidecar containers configured this way won't prevent pod completion; if the main containers finish, the sidecars will be terminated. Their restartPolicy can override the pod-level policy, ensuring they can restart even if the pod itself is set not to. This is designed to support advanced patterns, like in the following example:

kind: Pod
spec:
  initContainers:
  - name: vault-agent
    image: hashicorp/vault:1.12.1
  - name: istio-proxy
    image: istio/proxyv2:1.16.0
    args: ["proxy", "sidecar"]
    restartPolicy: Always
  containers:
  ...

Additionally, sidecar containers will benefit from full lifecycle support, including PostStart and PreStop handlers, as well as all types of probes (startup, readiness, liveness). Their readiness probes will even influence the overall pod readiness, further integrating them into core workload behaviour.

Scheduling in Kubernetes 1.33


#3094 Take toleration/taints into considering when computing skew
Stage: Graduating to Stable
Feature group: sig-scheduling

When a Pod uses topologySpreadConstraints, nodes with taints that the Pod doesn’t tolerate, particularly the Unschedulable taint, should ideally be excluded from skew calculations. Ignoring such taints can cause scheduling issues, especially during node upgrades or replacements.

Currently, Kubernetes includes cordoned nodes (which carry the same Unschedulable taint) in topology skew calculations, assuming they might still be valid targets. This often leads to problematic behaviour: Pods see those nodes as viable based on topology, but can’t actually schedule onto them due to the taint. As a result, they get stuck in a Pending state until the node is fully removed. This becomes especially disruptive in automated, cloud-managed Kubernetes environments like GKE, where unattended upgrades can leave critical workloads un-schedulable for extended periods - sometimes up to an hour.


#3633 Introduce MatchLabelKeys to Pod Affinity and Pod Anti Affinity
Stage: Graduating to Stable
Feature group: sig-scheduling

This KEP suggests adding matchLabelKeys (and its counterpart mismatchLabelKeys) to the PodAffinityTerm structure. This enhancement allows for more granular control over pod scheduling by refining how Pods are evaluated for co-location (PodAffinity) or separation (PodAntiAffinity), beyond what is achievable with only LabelSelector.

A common issue arises during rolling updates, where both old and new versions of Pods temporarily coexist. Since the scheduler cannot inherently distinguish between these versions, it may misinterpret affinity rules, leading to suboptimal scheduling or, in a saturated cluster, the inability to schedule new Pods entirely. By introducing matchLabelKeys, users can limit affinity evaluation to Pods sharing specific labels (such as the pod-template-hash applied by the Deployment controller) to distinguish ReplicaSets - thus, ensuring affinity rules apply only within the same version of a deployment.

Here’s an example of how we can implement this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: application-server
…
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - database
        topologyKey: topology.kubernetes.io/zone
        matchLabelKeys:
        - pod-template-hash

In short, by leveraging matchLabelKeys, the scheduler can now apply affinity rules more precisely, improving overall pod placement behaviour during rolling updates.

#4832 Asynchronous preemption in the scheduler
Stage: Graduating to Beta
Feature group: sig-scheduling

This proposal aims to decouple preemption-related API calls from the core scheduling cycle to improve scheduling throughput, especially in failure scenarios. In the current model, the scheduler (being a single entity in a cluster) handles pods one at a time, and reducing blocking operations like API calls during scheduling is crucial for performance. Similar to how the binding cycle operates asynchronously after scheduling, this proposal introduces asynchronous handling of preemption operations. Instead of making blocking API calls during the PostFilter extension point, the scheduler will delegate these to a separate Go routine, allowing it to continue scheduling other pods while preemption is in progress. Once the preemption completes, the pod that initiated it can retry scheduling.

A potential risk lies in kube-apiserver instability, where frequent API call failures during preemption could lead to non-optimal scheduling decisions. For example, if many mid-priority pods initiate preemption but the API calls fail, the scheduler may incorrectly assume those pods are already scheduled, influencing the placement of other pods. However, this is not a significant concern, as scheduler performance generally degrades under kube-apiserver instability regardless. Even optimal scheduling decisions can fail due to unsuccessful binding API calls. Therefore, the proposal accepts this limitation without requiring additional mitigation.

Storage in Kubernetes 1.33


#4049 Add Storage Capacity Scoring
Stage: Net New to Alpha
Feature group: sig-storage

Another new enhancement introduces a scoring mechanism for nodes in the dynamic provisioning of Persistent Volumes, focusing on storage capacity in the VolumeBinding plugin. The idea is to assess the available space on nodes, allowing pods to be scheduled dynamically on nodes with either the most or least free space, depending on the specific need.

The motivation for this proposal is to improve storage capacity management when resizing node-local PVs or selecting nodes with minimal free space to optimise resource usage. The goal is to modify the current node scoring logic to account for dynamic provisioning, while leaving static provisioning unchanged. Cluster administrators can configure the scoring method through a new field in VolumeBindingArgs, allowing for two options: preferring nodes with either the least or most allocatable space, with the default preference being for nodes with maximum allocatable space to ensure room for volume expansion.

#2589 Portworx file in-tree to CSI driver migration
Stage: Graduating to Stable
Feature group: sig-storage

This enhancement proposal, part of the vendor-specific KEP for the CSI Migration from in-tree storage plugins, introduces two new feature gates as outlined in the parent document. It aims to migrate the Portworx in-tree volume provisioning to use the Portworx CSI driver instead, as part of the broader CSI migration effort. The migration is focused solely on Portworx and does not involve the core in-tree to CSI migration code in k/k. The necessary migration feature is already present in k/k, and this proposal involves enabling the Portworx-specific feature gates to make it functional for the Portworx driver. If you’d like to know more about this migration, refer to the above parent KEP.

#1710 Speed up recursive SELinux label change
Stage: Graduating to Beta
Feature group: sig-storage

This graduation was deferred from v.1.32 of Kubernetes. The aim here was to speed up the process of making volumes available to Pods on systems with SELinux in enforcing mode by eliminating the need for recursive relabeling of files. Currently, the container runtime must recursively relabel all files on a volume before a container can start, which is slow for volumes with many files. The proposed solution uses the -o context=XYZ mount option to set the SELinux context for all files on a volume without the need for a recursive walk. This change will be rolled out in phases, starting with ReadWriteOncePod volumes and eventually extending to all volumes by default, with options for users to opt out in certain cases. This approach significantly improves performance while maintaining flexibility for different use cases.

Last Highlight from Kubernetes 1.33

#4951 Configurable tolerance for Horizontal Pod Autoscalers
Stage: Net New to Alpha
Feature group: sig-autoscaling

The Horizontal Pod Autoscaler (HPA) in Kubernetes determines the number of replicas for a workload based on metrics like CPU utilization, using the ratio of current to desired metric values. However, to prevent frequent, unnecessary scaling (or "flapping"), a globally-configured tolerance - set to 10% by default - is used to ignore minor deviations from the target metric. For instance, even if a calculation suggests scaling from 100 to 107 replicas (a 7% increase), the autoscaler would take no action since it falls within the default tolerance range.

This proposal introduces a per-HPA tolerance parameter, allowing users to override the default threshold and fine-tune autoscaling sensitivity for individual workloads. By configuring a lower tolerance (e.g., 5%), users can enable the HPA to respond to smaller metric changes when needed. The new tolerance field will be added to the existing HPAScalingRules object and is optional. It falls back to the global value if unspecified. Users can also define separate tolerance values for scale-up and scale-down behaviours, offering more precise control over how each HPA reacts to metric changes.

Timeline of v.1.33 Kubernetes Release

Kubernetes users can expect the v1.33 release process to unfold throughout April 2025, with key milestones including two release candidates on April 8 and 15, followed by the official v1.33 release on April 23rd. Supporting events include the docs freeze yesterday (April 8th) and the release blog publication on the same day as the final release.

What is happening? By whom? And when?
Kubecon Europe Tuesday 1st - Friday 4th April 2025
Docs Freeze Docs Lead Tuesday 8th April 2025
1.33.0-rc.0 released Branch Manager Tuesday 8th April 2025
1.33.0-rc.1 released Branch Manager Tuesday 15th April 2025
v1.33 released Branch Manager Wednesday 23rd April 2025
Release blog published Comms Wednesday 23rd April 2025

For more insights on all things Kubernetes, check out our team's recap from KubeCon in London.

Get our next blog straight to your inbox