Jessica Liu, VP Product Management, Cerebras Software
Imagine every extra millisecond of latency stealing tokens — and therefore intelligence Every millisecond of AI latency steals intelligence from your model and delight from your user experience. In this high-energy session, Cerebras’ VP of Product Management pulls back the curtain on our wafer-scale inference platform that turns that millisecond tax into pure headroom for smarter model reasoning and higher user engagement. We’ll trace the industry’s pivot from “bigger pre-training” to “smarter run-time,” show why data scarcity and spiraling training budgets make inference the new competitive front, and reveal how models like Qwen3 and Deepseek gain IQ-like leaps by "thinking for longer". Expect live numbers, behind-the-scenes engineering stories, and a first look at the architectural tricks that let us stream 10× more tokens while cutting power in half. If you care about building agents that think in real time, not coffee-break time, this is your roadmap to the fastest inference on Earth—and the dawn of AI’s next era.
99 Rue de Rivoli
Paris 75001
France