Deepline | End of GPU-only strategy? Intel bets big on agentic AI

Deepline

2026.06.03 18:40

Over the past two years, the core of AI hardware has been the GPU. From model training to inference clusters, and from cloud to edge computing, the entire industry has been discussing who can secure more GPUs and cram more compute cards into data centers. It's fair to say that the whole AI industry has revolved around GPUs, propelling NVIDIA's stock price to repeated record highs.

However, at COMPUTEX 2026, Intel offered a different perspective: the next phase of AI cannot rely solely on GPUs. The crux of this judgment is the key phrase repeatedly emphasized by Intel CEO Lip-Bu Tan in his keynote: agentic AI.

The difference between agentic AI and traditional AI is substantial. Traditional AI operates like a "turn-based" Q&A machine, whereas agentic AI aims to enter real-world workflows, proactively completing a cycle of "thinking, planning, acting, and reflecting." In other words, it must learn to read data, invoke tools, execute tasks, check results, and continuously adjust its next steps based on feedback.

This means AI inference is no longer a one-off transaction but a continuously running system of self-decision and self-reasoning, fundamentally changing how computing power is utilized. Thus, Intel's core argument is that agentic AI will reshape the compute ratio within data centers.

Currently, in cutting-edge model training, the CPU-to-GPU ratio can approach 1:8, with GPUs shouldering the vast majority of the computational load. However, when transitioning to agentic inference mode, CPUs must handle task orchestration, tool invocation, data movement, and system coordination. At this stage, the CPU-to-GPU ratio will gradually move toward 1:1, and may even require higher CPU density for rapid task decomposition.

In fact, when an agent not only generates an answer but also continuously invokes models, tools, and external systems, its operational state differs entirely from traditional AI. Intel cited a statistic during the keynote: compared to single-shot inference, an agent's token consumption can increase by up to 1,000 times.

In other words, agentic AI brings not merely a linear increase in inference volume, but a more complex, higher-frequency, and more fragmented system load. Offloading all these tasks to GPUs would be both inefficient and expensive.

Intel's newly launched Xeon 6+ processor, built on the Intel 18A process, features up to 288 efficient cores and up to 576 MB of L3 cache. Designed for cloud-native, Agentic AI, and network-intensive workloads, it delivers higher energy efficiency and more stable, sustained performance.

In Intel's proposed solution, a single liquid-cooled rack occupying 32U of compute space can provide 36,864 cores, with rack power consumption at approximately 100 kW, sufficient for high-density agent deployments. While 100 kW sounds substantial, it represents a significant power reduction compared to previous server racks with equivalent performance.

Beyond the Xeon 6+, an even more noteworthy development is Intel's re-architecting of inference pipelines.

During the keynote, Intel announced a partnership with SambaNova, Vista Equity Partners, and Cambium Capital to launch a fully disaggregated inference solution. This solution runs on the Vector Core Compute's agentic cloud, where Intel Xeon 6 processors handle orchestration and execution, SambaNova SN40 RDUs take care of decoding, and NVIDIA Blackwell GPUs manage prefill.

This new architecture is purpose-built for Agentic workloads. Unlike many past AI systems that offloaded most inference pipeline tasks to GPUs, this system assigns distinct roles to CPUs, RDUs, and GPUs—handling system scheduling, decoding, prefill, and other stages respectively—ensuring each inference step runs on the most suitable hardware for maximum efficiency.

After introducing the Xeon 6+, Intel also highlighted the third-generation Core Ultra processors, released earlier. They represent another pillar of Intel's AI ecosystem: on-device AI. During the keynote, Intel and Perplexity demonstrated a hybrid local server built on the third-generation Core Ultra and Xeon 6+-based cloud servers.

This solution dynamically distributes workloads between local and cloud environments based on device capabilities and functional requirements, further reducing reliance on cloud compute. This represents the ideal future state of AI PCs: by dynamically allocating performance, it lowers token costs while ensuring task immediacy and data privacy.

Beyond PCs, Intel is extending the third-generation Core Ultra into gaming handhelds and edge computing. The newly announced Arc G3 series processors, optimized for handheld gaming devices on the same generation architecture, will be available later this month.

In addition to general-purpose processors, Intel emphasized custom chips, a business Tan has been actively promoting since becoming Intel's CEO.

Intel believes custom chips will see a massive market in the future. As AI penetrates different industries, customers increasingly find general-purpose computing insufficient. To achieve higher efficiency and performance, they will gravitate toward custom chips to maintain competitiveness.

During the keynote, Intel mentioned partnering with Google on IPUs, which are critical for cloud service providers to enhance infrastructure performance. Simultaneously, Intel is collaborating with telecommunications customers like Ericsson to provide advanced wireless infrastructure chips globally.

This underscores another theme of Tan's keynote: Intel no longer aims to win the market with a single general-purpose chip. Instead, it packages chips, systems, software, and industry partnerships into a complete solution that can be freely customized for different enterprise needs, maximizing Intel's strengths.

Intel is effectively redefining its ecosystem position: data centers need CPUs for agent orchestration; inference systems require heterogeneous disaggregation to lower costs; PCs need on-device AI for privacy and compliance; edge and embodied AI require energy-efficient chips; and industry customers need customized silicon.

By addressing enterprise needs across different domains, Intel aims to become even more ubiquitous than NVIDIA.

Of course, it still faces immense pressure. NVIDIA's advantages in AI accelerators and software ecosystems remain significant, and AMD continues to press forward in server CPUs and AI chips. For Intel to succeed, the ultimate test lies in the ramp-up speed of 18A, the rapid deployment of Xeon 6+ rack-scale solutions, and whether customers genuinely see substantial returns from this new approach.

But at least this time, Intel's direction is clearer than ever.

As AI enters the agentic era, competition is no longer just about peak performance of a single chip, but about optimizing the collaborative efficiency of entire computing systems. GPUs will remain important, but CPUs, edge devices, on-device AI, and custom chips will also become critical.

And Intel is aiming to seize this window of opportunity for a new division of labor in AI infrastructure.

(Source: Leikeji)

Deepline | He Tingbo and Tau Scaling Law: Huawei flips chip design on its head

Tag:·CPU·Lip-Bu Tan·agentic AI·token consumption·AI chips