OpenAI's first-ever chip in history was born, with a quite spicy name: Jalapeño.
It is OpenAI's first "pntelligence processor," designed specifically for large-scale model inference. From a blank slate to tape-out, the entire process took just nine months.
OpenAI calls this "the fastest high-performance advanced semiconductor ASIC development cycle in history."
Chipmaking usually takes years. What's even more interesting is that the team that helped accelerate the design was none other than OpenAI's own models. The AI they built built the chip they would run.
Here's a vivid scene: that Jalapeño chip was hand-delivered by Hock Tan, President and CEO of Broadcom, and Charlie Kawwas, President of the Semiconductor Solutions Group, to OpenAI's CEO Sam Altman and President Greg Brockman.
In its official announcement blog, OpenAI stated that this marks an important step in its strategy to "build the full stack behind its models and products."
And Broadcom is not the only partner.
OpenAI provides the brains: designing the chip architecture from scratch, relying on its deep understanding of the underlying logic of large models, combined with its own model roadmap, kernels, service systems, and product requirements. Broadcom contributes the craftsmanship: chip implementation, networking, and taking things to mass production. Another company, Celestica, handles board, rack, and system assembly. Broadcom also explicitly mentioned that its Tomahawk networking chips are being used to help the platform scale to mass production.
Overall, a forward-looking OpenAI is truly determined to take control of the entire supply chain.
Previously, it only did two things: train the most powerful models, then turn them into products (ChatGPT, Codex, APIs, and that whole suite). Now, it has dug yet another layer deeper into the infrastructure: chip architecture, kernels, memory systems, networking, scheduling, and deployment systems, all built in-house.
And better infrastructure → higher compute efficiency → better training and serving → stronger models → better products → more users and revenue → reinvest into the next generation of infrastructure. As this cycle spins, intelligence becomes increasingly powerful, stable, and affordable.
What's astonishing is that from initial design to tape-out, the step where the design is sent to the factory to be turned into physical silicon, only nine months passed. This is probably the fastest ASIC development run in high-performance advanced semiconductors.
So how did they pull it off in such a short time?
The reasons are twofold: first, the co-development of hardware and software. OpenAI's engineers worked side-by-side with Broadcom's chip design team; second, OpenAI directly used its own models to accelerate parts of the chip design and optimization process.
In other words, the very models used by customers are, in turn, helping OpenAI build the hardware that the next generation of models will run on.
Richard Ho, who leads the hardware project at OpenAI, said they optimized the architecture around what matters most: kernels, memory movement, networking, and serving patterns. According to early tests, Jalapeño can run OpenAI's most critical workloads at near the hardware's theoretical limits.
To put it simply: if a chip's peak performance is 100 points, conventional chips typically achieve only 60–70 points in practice, with the rest wasted on shuffling data back and forth. Jalapeño's architecture aims to reduce data movement, balancing compute, memory, and networking resources so that real-world performance approaches that perfect 100.
That said, OpenAI hasn't released final performance numbers yet, saying a detailed technical report will come in a few months. But early tests have revealed one thing: performance per watt will be "substantially better" than the current industry best. In the lab, engineering samples are already running real machine learning workloads, with frequency and power consumption meeting mass-production targets.
One more noteworthy point: Jalapeño is not a tweaked version of an older AI chip; it was designed from the ground up for modern large-model inference.
Its reference point is the actual systems that OpenAI runs every day on ChatGPT, Codex, APIs, and future Agent products.
The goal is also clear: deliver both the compute power and throughput of top-tier AI accelerators, while pushing latency down to levels close to the fastest specialized inference systems — designed specifically for large-scale, interactive large-model products.
Jalapeño is not a one-off. It is the first step in a "multi-generation compute platform." The plan is to start deploying it by the end of 2026, then scale up over the following years. The entire platform is a combination of OpenAI-designed accelerators, Broadcom's chip implementation, networking, and interconnect technologies, plus Celestica's board, rack, and system capabilities working together.
Tan noted that this is just the beginning, with a multi-generation roadmap ahead. He also casually mentioned that, through these jointly developed chips with OpenAI, they will begin building gigawatt-scale data centers starting in 2026, alongside partners like Microsoft.
What does a gigawatt mean? It's on the scale of a large nuclear power plant's output. OpenAI truly intends to stack compute at the scale of power plants. Brockman even framed the collaboration in grander terms, "The world is transitioning to a compute-powered economy."
In his logic, Jalapeño is part of OpenAI's long-term full-stack infrastructure strategy: the more of the stack they design themselves, the more efficiently they can "sell" intelligence, bringing advanced AI to a broader audience.
As for why they're doubling down on inference chips, the logic is actually straightforward:
Inference is the key link where AI reaches users and people. A little less cost, a little more speed, a little more stability — and on the user side, that means ChatGPT responds faster, Codex can take a few more steps with less waiting, APIs are cheaper and easier to integrate, and systems are less likely to crash during peak loads.
So what this chili pepper ultimately wants to do is, stripped down, not complicated: turn more computing into intelligence that ordinary people can afford to use every day. Students, developers, small business owners, researchers, enterprises — anyone who wants to learn, create, or solve tough problems can tap into it.
And of course, the most delightful part is the recursive loop: AI designs chips, chips run AI, and AI goes on to design the next generation of chips. Once that loop starts spinning, it's a bit hard to stop.
Related News:
Deepline | Europe's AI dilemma: Caught between geopolitics, energy crisis, and fragmentation
Comment