This article has been updated to include insight following the Google I/O 2026 event.
The Production Problem No One Is Talking About
Live AI is the highest-risk recurring moment in any major company’s annual calendar today. And almost no one outside a handful of production teams has a clear framework for how to think about it.
Most of the conversation about AI demos focuses on what the AI does. The harder question is how the production around the demo reinforces its value. The old playbook was designed for deterministic software – scripted, rehearsed, locked down. And, simply put, agentic AI doesn’t work that way.
The product is unpredictable. The demo runs once – with the share price moving in real time. The production team can’t fully constrain the output. And the audience is watching with their finger on the share button. Get it right and you reset the equity narrative. Get it wrong and you become the case study in someone else’s analysis.


Google has been the case study before. In February 2023, a single live Bard demo answered one question incorrectly about the James Webb Space Telescope. By the next day’s close, Alphabet had lost roughly $100 billion in market capitalization.
In a few days, Sundar Pichai will walk back on stage at Shoreline Amphitheatre for Google I/O 2026. (The keynote begins at 10:00 AM PT on Tuesday, May 19.) Reports point to a major Gemini model update, a new agentic AI assistant called Gemini Spark, Android 17, and the unveiling of Aluminium OS. Some of these will demo well in a controlled environment. Others will require live, multi-step, agentic AI performance in front of a global audience. Fortunately, the production discipline that distinguishes a credibility-building demo from a market-moving disaster has been developed largely in public, by Google itself, over the past nine years.
And to interpret Google I/O 2026, you have to understand the framework Google has been building since 2018.
Why Live AI Demos Are Structurally Different – and Why That Changes Everything
For decades, corporate live demos followed a simple production logic. You wrote the script, ran the rehearsals, and planned for what could go wrong. A Salesforce dashboard demo. An iPhone software walkthrough. An Adobe Creative Cloud feature reveal. The product behaved predictably because the demo was, in essence, a high-fidelity rehearsal performed live.
Live AI demos break that model in four specific ways.
Agentic AI works differently. You give the system a goal, and it decides how to get there. The production team can guide the prompt but not the path. How and what the model produces between input and output isn’t fully predictable. That’s both the promise of agentic AI – and the production risk.
Even with the same prompt, the model can produce a different response. Sometimes the difference is small. Sometimes it’s significant. Sometimes the answer is just wrong. Until the demo actually runs in front of the audience, the production team can’t know exactly what the model will say.
aA multi-step AI agent might take 45 to 90 seconds to finish a task. On a live stage, 45 seconds of an AI “thinking” is a production crisis. The audience disengages. The camera has nothing to cut to. The speaker has to fill the silence. A traditional software demo can choreograph every second. An AI demo can’t.
A traditional software demo failure is embarrassing. A live AI demo failure is a stock-moving event. The Bard incident wasn’t a fluke – it was the first clear signal of a new category of risk. Since 2023, more than one AI company has watched its share price move on the strength of a single live demo.
For us, these aren’t quirks to manage. They’re a structurally new production category. And they require a fresh approach.

How Google Built the Framework: A Nine-Year Evolution
No company has confronted the live AI demo problem longer or more publicly than Google. The framework isn’t written down anywhere, but it is present in the production decisions Google has made over the past nine years – including the ones it learned the hard way.
Six moments define that evolution.
At I/O 2018, Google played pre-recorded phone calls in which its Duplex AI assistant booked a hair appointment and called a restaurant. Sundar Pichai introduced the recordings as “the Google Assistant actually calling a real salon.” The AI sounded so human, complete with “um” and “uh” verbal tics, that the demo went viral within hours.
Then came the questions, first raised by Axios: why didn’t the businesses identify themselves when they picked up? Why was there no ambient noise? Were these genuine real-world interactions or carefully staged recordings? Google declined to provide the names of the businesses or confirm whether the calls had been edited. The credibility question presented a separate ethics issue: should an AI identify itself as AI when calling a human?
The lesson: When an audience can’t verify whether what they’re seeing is real, the demo creates more doubt than it resolves. Specificity, transparency, and verifiable detail are essential to earning trust and demo credibility.
Google held a small launch event for Bard featuring a promotional clip in which the AI gave a factually incorrect answer about the James Webb Space Telescope. Astronomers noticed within hours. Alphabet shares dropped about 7.7% the following day, erasing roughly $100 billion in market value.
The lesson: Every public AI output is a public statement. Fact-checking demo content isn’t a marketing task – it’s a risk management one.
Google released a six-minute video billed as “Hands-on with Gemini: Interacting with multimodal AI.” It appeared to show the model engaging with images, drawings, and a continuous spoken conversation in real time.
Within 24 hours, Bloomberg’s Parmy Olson reported that the demo had been constructed differently than the video implied: Gemini wasn’t responding to spoken voice or live video at all. Google’s team had fed the model still image frames and text prompts, then added the voice narration afterward in post.
Google’s disclaimer in the video description noted only that “latency has been reduced and outputs have been shortened.” The narrative shifted from “Gemini is remarkable” to “Google misrepresented Gemini.”
The lesson: The line between live demo and marketing video has to be transparent. The credibility hit when an edit gets exposed is worse than the lift from a polished demo.
At I/O 2024, Google unveiled Project Astra – its real-time multimodal assistant – with a deliberate production move that broke from prior patterns. The demo aired as two continuous takes, one on a Pixel phone and one on a prototype pair of smart glasses.
The signal to the audience was clear: Google hadn’t cherry-picked the responses. The model was handling a stream of inputs in real time, rough edges and all. Around that demo, Google also began explicitly labeling other AI segments as recorded or aspirational rather than implying everything was live. The change was subtle in execution but marked a significant milestone. Google had stopped trying to make everything feel live and had started telling the audience exactly what they were watching.
The lesson: Labeling the kind of demo you’re showing is the first rule of demo credibility. The audience will forgive almost any production choice if they know what choice you made.
At I/O 2025, Project Astra moved from research demo to shipping product, powering new experiences in Search Live, the Gemini app, and third-party developer tools. The production decision here was as significant as the engineering one.
Having spent 2024 carefully framing Astra as a live, unscripted experience, Google could now invite the audience to use the same capability themselves. The demo and the product had become indistinguishable, which is the highest form of credibility a live AI demo can earn.
The lesson: When the AI demo eventually becomes a product launch, the production discipline that surrounds it becomes the foundation for long-term trust.
At I/O 2026, Google staged the most agentic-AI-heavy keynote in the event’s history. Gemini Spark – a 24/7 personal AI agent designed to act autonomously across apps, emails, and calendars – was the headline product. Antigravity 2.0 was demoed by showing an operating system that the AI had built from scratch over 12 hours, then demonstrating it running Doom live on stage.
The Samsung XR glasses demos had real-time presenters using the eyewear to ask Gemini where to meet a friend, order coffee with a tip, and capture photos – rough edges visible, no edits. The doctrine held. No demo failures on the scale of Bard 2023. The framing across the keynote was unusually clean: live demos clearly labeled as live, aspirational segments clearly labeled as future-state.
The lesson: When a company builds the framework for nearly a decade, the production framework starts to feel native rather than imposed. But clean execution alone isn’t enough anymore – the audience now expects production to also resolve the strategic questions they walked in with.
That’s nine years of drafting the playbook, paid for in public embarrassment, market cap, and corrective communication. It distills into a single principle: the audience doesn’t need the demo to be perfect – but they need to know exactly what kind of demo they’re watching.
Which brings us to the framework.
The 5 Demo Modes of Live AI
Most companies treat a live AI demo as a binary – either it’s live or it isn’t. The actual production reality is a spectrum, and failure in almost every public AI demo controversy comes down to misalignment between which category the audience thought they were watching and which one was actually being staged.
There are five distinct ways to stage a live AI demo. We call them the Demo Modes – a five-category framework for live AI production
rThe AI runs in real time during the event. No pre-staging. No predetermined output. The speaker delivers an input, and the audience watches the response unfold in real time. Highest credibility, highest production risk. The two continuous-take Project Astra demos at I/O 2024 were the clearest recent example of Mode 1 done well.
The AI is running in real time, but inside a controlled environment. The prompts are curated, the use cases are scoped. The model is genuinely working, but the production team has narrowed what it might be asked to do. Mid-high credibility when the framing is transparent, mid risk. Most enterprise software AI demos today are Mode 2 whether the company says so or not.
The AI completed the task minutes or hours before the event. The audience watches the playback of an actual real run – including any imperfections – with the speaker explicitly framing it as such: “We ran this just before walking on stage. Here’s what it produced.” Mid credibility when disclosed, low risk. This mode is dramatically under-used. Done well, it captures most of the trust of a live demo while significantly lowering the risk of failing on stage.
A polished video of the AI performing a task, clearly labeled as recorded. Low credibility for capability claims but high credibility for visual production quality. The Gemini Hands-On video would have qualified as Mode 4 if Google had labeled it that way. The controversy emerged because it wasn’t.
Explicitly framed as “what’s possible,” “what we’re building toward,” or “where this is headed.” It’s a preview of where the product is going – not proof of what it can do today. Lowest credibility for capability claims, but useful for setting vision. Google should have framed the Duplex demo at I/O 2018 this way. Instead it was framed ambiguously enough to read as Mode 1.
The Demo Modes aren’t a ranking. They’re a set of choices. A keynote can deliberately mix categories – Verified Live for the headline demonstration, Constrained Live for the enterprise capability, Pre-flight Live for the agentic workflow, Pre-Recorded for the partner integration, Aspirational for the long-term roadmap.
The discipline isn’t picking the “best” mode. The discipline is making sure the audience knows which one they’re watching.

What to Watch For at Google I/O 2026
The framework becomes most useful as a real-time reading tool. Here’s how it applies on Tuesday, May 19.
Watch the next-generation Gemini reveal.
Reports point to a major Gemini model update at the keynote – whether labeled Gemini 4 or a 3.x successor. The production question is which mode Google chooses for the headline demonstration. Verified Live (Mode 1) would be the most confident move – signaling that Google trusts the new model to perform outside of a controlled environment. Constrained Live (Mode 2) would be the more cautious choice. If Google frames the demo as anything other than Mode 1 or 2, that’s a signal worth noting.
Watch for Gemini Spark.
Leaks point to a new agentic AI assistant called Gemini Spark – designed to work autonomously across apps, emails, calendars, and websites. Booking flights. Managing email. Filling out forms. This is the most production-risk kind of demo a company can stage today, because every action links to the next, and a single failure cascades across all points. The production decision is whether to demo Spark’s full workflow live (Mode 1 – high risk, high reward), to scope it tightly (Mode 2 – safer, less impressive), or to compress the experience via Pre-flight Live (Mode 3 – the team runs it just before the event and acknowledges it openly). Watch for the speaker’s framing language at the moment of the reveal. If they say “we ran this just before walking out,” that’s Mode 3 done well. If the demo cuts cleanly between steps without acknowledgment, the production team has chosen polish over transparency.
Watch how Google handles failure moments.
Every live AI demo at I/O 2026 will have some friction. Latency. A response that lands awkwardly. A model output that’s correct but visually unimpressive. The production decision is whether to absorb that friction visibly (the Astra 2024 approach) or to edit it out (pre-2024 approach). The Astra approach is the more mature move. Watch for it.
Watch the segmentation between live and recorded.
I/O 2024 introduced explicit labeling. I/O 2025 refined it. If I/O 2026 makes the live vs. recorded distinction even cleaner – graphics, lower-thirds, verbal framing – that’s Google institutionalizing their framework. If the line blurs again, that’s a regression worth flagging.
Watch the Cloud and enterprise demos especially.
The most consequential audience at I/O 2026 isn’t the developers in Shoreline. It’s the institutional investors evaluating Google Cloud’s AI revenue trajectory. Pichai disclosed at Cloud Next 2026 that just over half of 2026 ML compute investment will go to the Cloud business. The Cloud demos at I/O have to translate that capex into a credible product story. Watch how those demos are categorized. Constrained Live with enterprise customer logos as visible validation carries weight. Aspirational framing doesn’t.
Anyone who watches I/O 2026 with the 5 Demo Modes framework in hand will likely walk away from the keynote with a deeper understanding of these products and features than the reader who watches for product news alone.

The Production Decisions That Make or Break a Live AI Demo
The 5-Mode framework names the demo categories, but the execution lives in the production decisions that distinguish one category from another. Four of them carry disproportionate weight.
Mode 1 (Verified Live) requires real-time agility. The speaker has to be ready to narrate whatever the model produces – including responses they’ve never seen before. That’s a different kind of prep than walking through a rehearsed click sequence. Pichai’s comfort with live AI moments is a production advantage Google has built over years. Most CEOs aren’t there yet.
Every live AI demo needs a written set of fallbacks: if the model produces a problematic response, what does the speaker say next? If latency drags on, where does the camera cut? If the demo fails entirely, how does the show move on without acknowledging it? The audience never sees the contingency. They only see the recovery. The Bard launch failure wasn’t a demo failure – it was a contingency-planning failure. The factual error was visible in promotional materials before the event. Better fact-checking should have caught it.
When a live AI demo is processing, the camera has to go somewhere. A cut to the speaker carries one signal. A cut to crowd reaction carries another. A cut to a product graphic carries a third. Each choice tells the home audience something different about whether to trust what’s happening. At I/O scale, this requires a director, multiple operators, and pre-planned camera blocking for every demo segment.
Mode 3 demos don’t happen by accident. They require the production team to actually run the demo backstage, capture the output, and have it ready to play back within minutes of the live moment. That’s a second production happening at the same time as the live event. Most companies don’t budget for it. The ones that do have a tool the others don’t.
These decisions aren’t decorative. They’re the difference between a demo that builds credibility and one that costs market cap.
Where AI Demos Are Heading
Google I/O 2026 isn’t an isolated event. It’s the first in a three-week window that includes Microsoft Build (June 2–3) and Apple WWDC (June 8–12). All three companies will stage live AI demos. All three will face the same production decisions. And by the end of June, the industry will have its first complete data set for how the leading public AI companies are navigating the new production risk.
And others are watching them closely. Salesforce Dreamforce in September will stage Agentforce demos. Workday, ServiceNow, Adobe, and every major enterprise software company will demo agentic capabilities at investor moments over the next year. The companies that have a clear live AI demo framework will appear more credible than those with technically superior AI but worse production discipline.
That’s the broader implication. When every public company is staging live AI, the production discipline around the demo becomes part of the equity story itself. Not just for the AI labs – for any company whose narrative depends on showing product capability and evolution.
The 5 Demo Modes aren’t a prescription. Different companies, different audiences, different products will call for different combinations. What every company needs is the vocabulary to make those choices on purpose – not by accident.
Google has been learning that vocabulary in public for eight years. The lesson the rest of the industry has yet to fully absorb is that the question isn’t whether to demo live. It’s whether the production team is ready to handle what happens when you do.
The high-wire act is permanent. The model for walking it is still taking shape. Google I/O 2026 is the next big stage.
That’s the work worth investing in. It’s also the work Cardboard Spaceship builds for clients navigating the moments that matter.

What Google I/O 2026 Actually Staged
Update: This section was added after Google I/O 2026 wrapped to validate our framework against the actual two-day event.
Google I/O 2026 ran from May 19 to May 20. Sundar Pichai walked off the Shoreline Amphitheatre stage having staged the most agentic-AI-heavy keynote in the event’s history – followed by a Developer Keynote that quietly proposed an architectural overhaul of how the web itself works. Here’s how it tracked against the framework, and what the broader industry should take from it.
Antigravity 2.0 was the boldest production move of the keynote – and the clearest Mode 3 in Google’s history.
Varun Mohan, head of Google’s Antigravity platform, demoed agentic coding by showing how Antigravity and Gemini 3.5 Flash together built a functioning operating system from scratch in 12 hours, using less than $1,000 of tokens. The OS was then demonstrated running Doom live on stage.
This was a textbook Mode 3 (Pre-flight Live): the AI did the actual work autonomously in the hours before the event, and the audience saw the genuine output. The catch: Google didn’t visually communicate the Pre-flight Live nature of the demo as clearly as the framework would prescribe. The 12-hour reality was disclosed verbally but compressed into a moment that read closer to Mode 1 in the audience’s mind. The most impressive demo of the keynote and the most under-framed production move – at the same time.
Gemini Spark was demoed in Mode 2 (Constrained Live).
Josh Woodward took the stage to show Spark planning a block party – coordinating schedules, permits, and calendar integrations through tightly scoped prompts on an iPhone. The model worked in real time, the prompts were curated, the use case was defined. This was the right production decision for a brand-new product with broad cross-app permissions.
Spark is genuinely high-risk to demo because every action chains to the next. Constrained Live limits that chain to a deliberate set of steps without sacrificing the live energy.
The Samsung XR glasses demos went Mode 1 (Verified Live).
Real presenters on stage using the glasses to ask Gemini where to meet a friend, order coffee, and capture photos – with rough edges left in. This was the production choice closest to the Astra 2024 approach. The friction wasn’t hidden. The audience saw the model working in real time, sometimes imperfectly, and trusted what they saw more for it.
Hassabis closed the keynote in Mode 5 (Aspirational).
Demis Hassabis’s “AGI is now on the horizon” framing was explicitly labeled as future-state – not current product. This is exactly how Mode 5 should work. The audience knows they’re being shown a vision, not a capability. No credibility cost. No expectation mismatch.
The bigger story arrived in the Developer Keynote.
Day 1 afternoon brought the announcements with the longest-tail production implications: WebMCP, an open web standard for AI agents; Chrome DevTools for agents as a stable 1.0 release; HTML-in-Canvas; Modern Web Guidance; Android CLI; Android Bench. The framing in Google’s own keynote recap: “We’ve transitioned from AI that simply assists you, to agents that can independently navigate complex tasks across your entire workflow.” This is the bet that recasts every live AI demo from this point forward. Every demo is now also a demo of the agentic web thesis – and the production stakes have just compounded.
The doctrine held. The market read it anyway.
No demo failures on the scale of Bard 2023, no edited-video controversies, no credibility leaks. The framing across the keynote was unusually clean: live demos clearly labeled as live, aspirational segments clearly labeled as future-state. And yet Alphabet’s stock slid during the keynote. The next morning, BofA reaffirmed Alphabet at a $430 price target, Wells Fargo raised its target to $435, and Morgan Stanley called out the “agentic offerings across commerce, travel and daily life.”
So the picture is nuanced: the demos themselves didn’t fail, but the production didn’t sufficiently answer the question Wall Street walked in with – how AI Mode in Search will be monetized when 93% of those searches already end without an external click. The lesson is sharper than “live demos move markets.” It’s that production decisions are now responsible for resolving the audience’s open questions, not just demonstrating the product. The Bard-era risk was that a live demo could break the equity story. The new risk is that even a clean live demo isn’t enough.
For the broader industry, the next test cases arrive in two weeks.
Microsoft Build (June 2–3) and Apple WWDC (June 8–12) will stage their own live AI demos – and their own answers to the agentic web thesis Google just planted. Both companies have learned from Google’s nine-year public arc. By the end of June, the industry will have its first complete data set for how the leading public AI companies are handling not just the production risk of live demos, but the production responsibility of resolving institutional questions in real time.
Watch which Modes they choose. The framework still applies. The stakes just got higher.
That’s the work worth investing in. It’s also the work Cardboard Spaceship builds for clients navigating the moments that matter.
Planning a live AI demo at your next high-stakes event?
The most consequential moments in modern corporate communications now run on live AI. Whether you’re preparing for an Investor Day, a product launch, a developer event, or an investor moment that includes an agentic demonstration, the production decisions you make now will define how the market reads your capability when the moment arrives.
Let’s start a conversation →
CNN Business: Google shares lose $100 billion after AI chatbot makes error during demo — February 2023
NPR: Google’s AI chatbot, Bard, sparks a $100 billion loss in Alphabet shares — February 2023
TechCrunch: Google’s best Gemini demo was faked — December 2023
Engadget: Google admits that a Gemini AI demo video was staged — December 2023
TechCrunch: Duplex shows Google failing at ethical and creative AI design — May 2018
Axios: What Google isn’t telling us about its AI demo — May 2018
TechCrunch: Google’s Gemini updates: How Project Astra is powering some of I/O’s big reveals — May 2024
TechCrunch: Project Astra comes to Google Search, Gemini, and developers — May 2025
Google Blog: Alphabet Q1 2026 earnings remarks — April 2026
Google Blog: Sundar Pichai shares news from Google Cloud Next 2026 — April 2026
Google I/O 2026 Official Site
