News Article · Jun 26, 2026 at 4:41 PM

3 min read 0

Member

Industry #AI agents #Meta #AWS #cloud infrastructure #CXL #memory pooling #Panmnesia #Vistara #session compute #ISCA 2026 #Stripe

CXL memory pooling and agent session compute reshape cloud infrastructure

Cloud infrastructure is undergoing two shifts: CXL matures with Panmnesia's fabric switch and Meta's Vistara recycling old DRAM, while AWS, Microsoft and Google standardize on the session as the new compute unit for AI agents.

Listen to this article 4 min

Two parallel shifts are reshaping cloud infrastructure in mid-2026: persistent memory pooling via Compute Express Link is moving toward real-world deployment, and the session is emerging as the dominant unit of compute for AI agents. Panmnesia and Meta are presenting competing CXL advances at ISCA 2026 this week, while AWS, Microsoft and Google are independently building agent session aware runtimes.

Panmnesia, a Korean fabless semiconductor company, is sampling a PCIe 6.4 CXL 3.2 Fusion Switch chip and has made its PCIe 7.0 CXL 4.0 Combo IP available. The company will present at ISCA 2026 a next stage CXL controller with shared buffers across layers to reduce latency, paired with a fabric switch using Port Based Routing that scales to 64 nodes while keeping memory access latency comparable to direct attached multi headed devices.

CXL moves from theory to production with Meta's Vistara

Meta will detail Vistara, its in house CXL ASIC for attaching recycled DDR4 DIMMs from decommissioned servers to new DDR5 based servers. The company reports that Vistara achieves a 25 percent reduction in server count for disaggregated machine learning inference and a 29 percent latency reduction for distributed caches. Meta says expanded memory via CXL delivers roughly 10x lower bandwidth and 60 percent higher latency than local memory, but its hardware software co design with Transparent Page Placement overcomes those penalties for most workloads.

Panmnesia's fabric switch supports both Port Based Routing and Hierarchy Based Routing, enabling flexible memory topologies beyond the tree structure of PCIe.
Meta's Vistara ASIC is optimized for power efficiency and low latency, and its software automates local to expanded memory ratio per workload.
Both papers will be presented back to back in the ISCA 2026 Industry Session on June 29 in Raleigh, North Carolina.
The CXL 4.0 specification ratified in 2024 includes support for fabric capabilities, though Panmnesia's implementation is among the first silicon proven examples.

AI agents force a new compute primitive: the session

While CXL tackles memory density, a separate trend unites the largest cloud providers. AWS, Microsoft and Google have each built runtime environments that treat an AI agent session as the fundamental scheduling unit, but they disagree on isolation mechanisms. AWS favors lightweight micro VMs, Microsoft is prototyping sandboxes based on WebAssembly, and Google is pushing eBPF based namespaces. Anthropic has also contributed research on session aware scheduling for agentic workloads.

Stripe, in a case study published on the AWS Machine Learning Blog, detailed its production grade ReAct agent framework for financial compliance, using prompt caching to reduce API costs by 40 percent and a dedicated agent service with human oversight for auditability. The Stripe implementation demonstrates how session based compute enables task decomposition and orchestration at scale, a pattern the hyperscalers are now standardizing. The session is expected to become the default billing and management unit in cloud platforms within the next 12 to 18 months, driven by the explosion of agentic systems that require stateful, long running interactions.

Fact check

Panmnesia is sampling a PCIe 6.4-CXL 3.2 Fusion Switch chip and has made PCIe 7.0-CXL 4.0 Combo IP available.

reported · source
Meta's Vistara achieves a 25 percent reduction in server count for disaggregated ML inference and a 29 percent reduction in average latency for distributed caches.

reported · source
AWS, Microsoft and Google have each built runtime environments that treat an AI agent session as the fundamental scheduling unit.

reported · source
Stripe built a production-grade ReAct agent framework for financial compliance using prompt caching to reduce API costs by 40 percent.

reported · source

Source reporting (5)

0 Comments

No comments yet

Be the first to share your thoughts on this article.

Join the conversation

You need to be registered and logged in to comment on blog articles.

AMD Zen 6 Prep Work Appears in Linux Kernel as Ryzen 7 5800X3D Returns for 10th Anniversary

Jun 26, 2026

General Intuition Raises $320M to Train AI Agents Using Video Game Action Data

Jun 26, 2026

YouTube Shorts removes dislike button, adds double-speed playback and clear screen mode

Jun 26, 2026

Back to News Desk

CXL memory pooling and agent session compute reshape cloud infrastructure

CXL moves from theory to production with Meta's Vistara

AI agents force a new compute primitive: the session

Fact check

Source reporting (5)

0 Comments

Related Articles

AMD Zen 6 Prep Work Appears in Linux Kernel as Ryzen 7 5800X3D Returns for 10th Anniversary

General Intuition Raises $320M to Train AI Agents Using Video Game Action Data

YouTube Shorts removes dislike button, adds double-speed playback and clear screen mode

Who Is Online