CNCF 10 Years: CPU Native vs GPU Native
MAKE CLOUD NATIVE UBIQUITOUS
CNCF (Cloud Native Computing Foundation) turns 10. I published a post when Kubernetes turns 10 in 2024. It's the same sentiment. It’s been a magic journey to be part of these programs for half of their ten-year history. I am honored and humbled for the opportunity to help various roles, such as Local Community Lead, Ambassador, Program Committee Member, and Speaker.
Motivated by Janet Kuo's presentation, 'Kubernetes at 10: A Decade of Community-Powered Innovation,' from the KuberTENes Birthday Bash at Google Mountain View Bay View office, along with the event’s T-shirt design. List my ten KubeCon:
Naming as CNCF Ambassador

(KubeCon EU 2023 Amsterdam, Netherlands. Keukenhof, known as the Garden of Europe)
I was first named a CNCF Ambassador in late 2022, with the announcement made public during KubeCon Europe 2023 in Amsterdam. I remember at the time, only three Ambassadors were selected from Amazon, one of them was my mentor and colleague, a Principal Engineer in the same VP organization. And I was the only one based in Seattle. After completing my one-year term, I’ve been reappointed as a CNCF Ambassador for another two-year serving term in 2024. It’s a privilege to be recognized and to continue being part of the global Cloud Native community, alongside 154 fellow Ambassadors from 37 countries and 124 companies. This journey has been both unforgettable and deeply meaningful to me.
CPU vs GPU
LLMs have ushered in a new era for GPU-based computing, powering both model training and inference, and paving the way for agentic AI. It feels like the CPU era faded almost overnight. I’ve worked on both sides, actually the founding engineer of two such products at Amazon: on the GPU side, Bedrock; and on the CPU side, quite a few, including App Runner, ECS/Fargate, Lambda, and Elastic Beanstalk.
Few diff between GPU / Other accelerators vs CPU
Red Hat's tweets "Why not just scale LLMs like any other app?" provided an in-depth descriptions of those challenges. Dynamo (an open-source Inference Framework) introduced by Nvidia in last month GTC25 exactly aligns with the two-stage structure described above, offering high throughput and low latency. Similar frameworks like Red Hat's llm-d, LMSYS.org's SGLang are doing the same thing as well.
Peter DeSantis had an excellent keynote in re:Invent 2024 that highlighted the diverse challenges for AI workloads. "One of cool things AI workloads is that they present a new opportunity for our teams to invent in entirely different ways." Peter said.
Kubernetes Community Movement
I had expected Kubernetes to move faster in this space. GPT-3.5 was introduced in late 2022, yet it wasn’t until mid-2024 that the community launched two relevant working groups: WG Serving and WG Accelerator Management, to address and enhance serving workloads on Kubernetes, specifically on hardware-accelerated AI/ML inference.
Google Cloud Run on GPU
Have to admit, Google Cloud Run made a right move. As prev builter of AWS App Runner, a product positioned similarly to Cloud Run, I'm excited to see Cloud Run now on GPUs, as announced at Google Cloud Next 2025. Serverless GPU support is a big deal, it enables Cloud Run to handle large models and opens the door to emerging opportunities in agentic AI, another big deal.
Last Mile Delivery
Currently, there's less discussion on lower-level engineering technologies compared to scientific research, even though there are a fair amount of innovations ongoing. Many underestimate the importance of this area, but it's actually the 'last mile' in delivering AI capabilities to end users. Research, engineering, and product are inseparable. The most pressing bottleneck right now lies in engineering—specifically in making models with high availability, high performance, and cost-efficient under today’s short compute resources.
Key Primitives (or Building Blocks)
The fundamentals of serverless remain unchanged. CNCF turns 10 now, Kubernetes, Lambda, ECS and Alexa turn 11. Bedrock and Claude turn 2. Someone says Bedrock is the "Lambda of LLM." I say it is more than that. As I putted in post during Serverless 10 year, serverless continues to play a key role in the LLM world, handling the heavy lifting and delivering real AI/ML value to customers. This principle has held true since before the 'Attention is All You Need' era.
What Comes Next
For 3+ years, I've served as a Program Committee member for the Open Source Summit and KubeCon + CloudNativeCon, and this year is no exception. After reviewing all the CFPs (Call for Proposals), it's clear that the theme for 2025 is 'Agentic AI', just as 2024 was all about 'LLM'. But what exactly is Agentic AI, and how can it be used to enhance productivity? There are many answers to that. Anthropic open-sourced MCP (Model Context Protocol), an open standard for connecting LLM applications with external data sources and tools, which has already gained popularity in the industry. Amazon introduced several innovations, including Nova Act, a new AI model trained to perform actions within a web browser, created by the Amazon AGI SF Lab (formerly Adept AI); the SWE-PolyBench, a multi-language benchmark for repository level evaluation of coding agents; and Strands Agents, an Open Source AI Agents SDK.
From builder perspective, Firecracker seems back to the stage. Inspired by Jeff Barr’s post, it’s clear that Firecracker lightweight VMs are becoming a enabler option for AI coding assistants or more: Agentic AI, allowing users speed development and deployment while running code in protected sandboxes. Companies like E2B are also embracing this approach, providing safe environments for running AI-generated code. As prev builder of Fargate on Firecracker, I’m excited to see this happening.
Future remains open, I’m an optimist. A new world, with a close collaboration between Product, Engineering, and Research you've never seen, is on the horizon. vLLM (a fast and easy-to-use library for LLM inference and serving), originally built at UC Berkeley and later donated to the LF AI & Data Foundation, is a typical example of a project that has grown into a community-driven effort with contributions from both academia and industry. And there will be more coming up.