AWS Identity Misconfigurations: We will show how attackers abuse simple setup errors in AWS identities to gain initial access without stealing a single password. Hiding in AI Models: You will see how adversaries mask malicious files in production by mimicking the naming structures of your legitimate AI models. Risky Kubernetes Permissions: We will examine "overprivileged entities"-containers that have too much power-and how attackers exploit them to take over infrastructure.
The feat was achieved by re-architecting key components of Kubernetes' control plane and storage backend, replacing the traditional etcd data store with a custom Spanner-based system that can support massive scale, and optimizing cluster APIs and scheduling logic to reduce load from constant node and pod updates. The engineering team also introduced new tooling for automated, parallelized node pool provisioning and faster resizing, helping overcome typical bottlenecks that would hinder responsiveness at such a scale.
This challenge is sparking innovations in the inference stack. That's where Dynamo comes in. Dynamo is an open-source framework for distributed inference. It manages execution across GPUs and nodes. It breaks inference into phases, like prefill and decode. It also separates memory-bound and compute-bound tasks. Plus, it dynamically manages GPU resources to boost usage and keep latency low. Dynamo allows infrastructure teams to scale inference capacity responsively, handling demand spikes without permanently overprovisioning expensive GPU resources.
Discord has detailed how it rebuilt its machine learning platform after hitting the limits of single-GPU training. By standardising on Ray and Kubernetes, introducing a one-command cluster CLI, and automating workflows through Dagster and KubeRay, the company turned distributed training into a routine operation. The changes enabled daily retrains for large models and contributed to a 200% uplift in a key ads ranking metric. Similar engineering reports are emerging from companies such as Uber, Pinterest, and Spotify as bespoke models grow in size and frequency.
Docker recently announced the release of Docker Desktop 4.50, marking another update for developers seeking faster, more secure workflows and expanded AI-integration capabilities. The release introduces a free version of Docker Debug for all users, deeper IDE integration (including VSCode and Cursor), improved multi-service to Kubernetes conversion support, new enterprise-grade governance controls, and early support for Model Context Protocol (MCP) tooling.
A safe, universal platform for AI workloads CKACP's goal is to create community-defined, open standards for consistently and reliably running AI workloads across different Kubernetes environments. Also: Why even a US tech giant is launching 'sovereign support' for Europe now CNCF CTO Chris Aniszczyk said, "This conformance program will create shared criteria to ensure AI workloads behave predictably across environments. It builds on the same successful community-driven process we've used with Kubernetes to help bring consistency across over 100-plus Kubernetes systems as AI adoption scales."
Google is launching Agent Sandbox, a new Kubernetes primitive built for AI agents. The technology provides kernel-level isolation and can run thousands of sandboxes in parallel. Google built Agent Sandbox as an open-source project within the Cloud Native Computing Foundation. The technology is based on gVisor, with additional support for Kata Containers. This provides kernel-level isolation that counteracts vulnerabilities. Each agent task is assigned its own isolated sandbox.
The Cloud Native Computing Foundation (CNCF) published a blog post discussing how vCluster, an open-source project by Loft Labs, addresses key multi-tenancy obstacles in Kubernetes clusters by enabling "virtual clusters" within a single host cluster. This approach enables multiple tenants to have isolated control planes while sharing underlying compute resources, thereby reducing overhead without compromising isolation. Traditional namespace-based isolation in Kubernetes often falls short when tenants need to deploy cluster-scoped resources like custom resource definitions (CRDs)
Airbnb's engineering team has rolled out Mussel v2, a complete rearchitecture of its internal key value engine designed to unify streaming and bulk ingestion while simplifying operations and scaling to larger workloads. The new system reportedly sustains over 100,000 streaming writes per second, supports tables exceeding 100 terabytes with p99 read latencies under 25 milliseconds, and ingests tens of terabytes in bulk workloads, allowing caller teams to focus on product innovation rather than managing data pipelines.
Sidero Labs has been developing Talos Linux, an immutable operating system purpose-built exclusively for running Kubernetes, alongside Omni, a cluster lifecycle management platform. InfoQ met the Sidero team in Amsterdam during the TalosCon 2025 and had conversations about their approach to simplifying Kubernetes operations through minimalism and security-first design. The concept for Talos emerged from practical frustrations with traditional operating systems in enterprise environments.
But Leo's expertise doesn't stop at tech. He also founded Homeland Shrimp, an indoor aquaculture business he engineered himself. His self-heating, closed-loop system is a blend of thermodynamics, automation, and sustainable thinking-designed to raise Pacific white shrimp efficiently and responsibly. Leo volunteers locally, helping seniors with yard care through a Sherburne County initiative. He also supports causes like Imagine Farm, which promote sustainable agriculture.
Kubernetes is the default platform for cloud-native applications, but managing Kubernetes at scale isn't trivial. New tools like Headlamp aim to reduce the overhead that comes with managing and deploying Kubernetes applications, but it is still easy to make mistakes and cause significant downtime. A recent survey from Komodor of enterprise Kubernetes usage showed that 79% of incidents in running environments are caused by system changes. On top of that, these outages take close to an hour to detect and resolve.
Amazon Web Services has announced a significant breakthrough in container orchestration with Amazon Elastic Kubernetes Service (EKS) now supporting clusters with up to 100,000 nodes, a 10x increase from previous limits. This enhancement enables unprecedented scale for artificial intelligence and machine learning workloads, potentially supporting up to 1.6 million AWS Trainium chips or 800,000 NVIDIA GPUs in a single Kubernetes cluster.
For OpenYurt, Kubernetes at the edge means running workloads outside centralised data centres at locations like branch offices or IoT sites. This helps reduce latency, improve reliability when connectivity is limited, and enable tasks such as analytics, machine learning, and device management, all while maintaining consistency with upstream Kubernetes. As the CNCF notes in its Cloud Strategies and Edge Computing blog, deploying compute resources closer to the network edge can deliver high-speed connectivity, lower latency, and enhanced security.
"Canonical is the number-one Cloud OS provider in the market with the Ubuntu containers, and VMware by Broadcom, with our VCF Foundation, is the number-one private cloud platform," said Prashanth Shenoy, VP of product marketing, VMware Cloud Foundation (VCF) division of Broadcom, during a media briefing. "So those two organizations coming together really helps our customers build Kubernetes-based modern applications."
The real value in multi-cloud operations lies in providing consistent operations across AWS, Azure, and Google Cloud Platform. "If I want to run something in Google and I want to run something in Amazon, if I have to learn new ways of doing things... it becomes more complex for the enterprise," he explained. Nutanix enables organizations to maintain the same operational model across all cloud environments, reducing training requirements and operational complexity.
The introduction of PhysicalBackup Custom Resources provides an efficient backup solution, particularly for large datasets, utilizing mariadb-backup and Kubernetes-native VolumeSnapshots.
Edge computing is evolving into a battleground where speed must be balanced with stringent security demands, exemplified by enterprises like Chick-fil-A leveraging Kubernetes for efficiency.
ArgoCD's UI and CLI are designed for users with extensive technical background, which limits access to GitOps workflows for less technical stakeholders. This increases reliance on DevOps engineers.
AI tools are transforming the interviewing process for job seekers, implementing efficiency through advances in technology and enhancing candidate experiences during recruitment.
One of the most frequent causes of failed deployments is an incorrect Kubernetes manifest. A typo in the YAML or a wrong API version can mean kubectl apply never succeeds or creates broken resources.
"We're addressing the number one complaint of Kubernetes, which is complexity," said Miska Kaipiainen, head of product for Lens at Mirantis. "Lens Prism puts the power of a site reliability engineer (SRE) inside every developer's IDE. It removes friction from day-to-day Kubernetes operations while maintaining enterprise-grade security and control."