CXL memory pooling and agent session compute reshape cloud infrastructure
Cloud infrastructure is undergoing two shifts: CXL matures with Panmnesia's fabric switch and Meta's Vistara recycling old DRAM, while AWS, Microsoft and Google standardize on the session as the new compute unit for AI agents.
Two parallel shifts are reshaping cloud infrastructure in mid-2026: persistent memory pooling via Compute Express Link is moving toward real-world deployment, and the session is emerging as the dominant unit of compute for AI agents. Panmnesia and Meta are presenting competing CXL advances at ISCA 2026 this week, while AWS, Microsoft and Google are independently building agent session aware runtimes.
Panmnesia, a Korean fabless semiconductor company, is sampling a PCIe 6.4 CXL 3.2 Fusion Switch chip and has made its PCIe 7.0 CXL 4.0 Combo IP available. The company will present at ISCA 2026 a next stage CXL controller with shared buffers across layers to reduce latency, paired with a fabric switch using Port Based Routing that scales to 64 nodes while keeping memory access latency comparable to direct attached multi headed devices.
CXL moves from theory to production with Meta's Vistara
Meta will detail Vistara, its in house CXL ASIC for attaching recycled DDR4 DIMMs from decommissioned servers to new DDR5 based servers. The company reports that Vistara achieves a 25 percent reduction in server count for disaggregated machine learning inference and a 29 percent latency reduction for distributed caches. Meta says expanded memory via CXL delivers roughly 10x lower bandwidth and 60 percent higher latency than local memory, but its hardware software co design with Transparent Page Placement overcomes those penalties for most workloads.
- Panmnesia's fabric switch supports both Port Based Routing and Hierarchy Based Routing, enabling flexible memory topologies beyond the tree structure of PCIe.
- Meta's Vistara ASIC is optimized for power efficiency and low latency, and its software automates local to expanded memory ratio per workload.
- Both papers will be presented back to back in the ISCA 2026 Industry Session on June 29 in Raleigh, North Carolina.
- The CXL 4.0 specification ratified in 2024 includes support for fabric capabilities, though Panmnesia's implementation is among the first silicon proven examples.
AI agents force a new compute primitive: the session
While CXL tackles memory density, a separate trend unites the largest cloud providers. AWS, Microsoft and Google have each built runtime environments that treat an AI agent session as the fundamental scheduling unit, but they disagree on isolation mechanisms. AWS favors lightweight micro VMs, Microsoft is prototyping sandboxes based on WebAssembly, and Google is pushing eBPF based namespaces. Anthropic has also contributed research on session aware scheduling for agentic workloads.
Stripe, in a case study published on the AWS Machine Learning Blog, detailed its production grade ReAct agent framework for financial compliance, using prompt caching to reduce API costs by 40 percent and a dedicated agent service with human oversight for auditability. The Stripe implementation demonstrates how session based compute enables task decomposition and orchestration at scale, a pattern the hyperscalers are now standardizing. The session is expected to become the default billing and management unit in cloud platforms within the next 12 to 18 months, driven by the explosion of agentic systems that require stateful, long running interactions.
Fact check
-
Panmnesia is sampling a PCIe 6.4-CXL 3.2 Fusion Switch chip and has made PCIe 7.0-CXL 4.0 Combo IP available.
reported · source
-
Meta's Vistara achieves a 25 percent reduction in server count for disaggregated ML inference and a 29 percent reduction in average latency for distributed caches.
reported · source
-
AWS, Microsoft and Google have each built runtime environments that treat an AI agent session as the fundamental scheduling unit.
reported · source
-
Stripe built a production-grade ReAct agent framework for financial compliance using prompt caching to reduce API costs by 40 percent.
reported · source
Source reporting (5)
- Blocks and Files · Panmnesia boosts CXL scale with fabric switching. Meta repurposes old DRAM with CXL
- The New Stack · AWS, Microsoft and Google agree the session is the new unit of compute. They disagree on how to isolate it.
- AWS Machine Learning Blog · Build interactive PDF text extraction from Amazon S3
- AWS Machine Learning Blog · How Cara pioneers domain-specific AI for enterprise insurance brokerages with AWS
- AWS Machine Learning Blog · Production-grade AI agents for financial compliance: Lessons from Stripe
Join the conversation
You need to be registered and logged in to comment on blog articles.
0 Comments
No comments yet
Be the first to share your thoughts on this article.