This week was a masterclass in how fast AI is moving. Join us as Paul and Mike break down everything from Google’s massive I/O announcements (Gemini, Veo, Live, and more), to Claude Opus 4’s impressive—and borderline alarming—capabilities and Paul shares a wild experiment that shows how current AI tools may already be enough to automate white-collar jobs.
Rapid-fire topics include OpenAI’s $6.5B Jony Ive acquisition, Microsoft’s overlooked Build event, AI’s energy problem, a chatbot benchmark startup raising $100M, and more.
Listen or watch below—and see below for show notes and the transcript.
Listen Now
Watch the Video
Timestamps
00:00:00 — Intro
- Intro to AI Course
- 5 Essential Steps to Scaling AI in Your Organization
- AI for B2B Marketers Summit
- MAICON
00:07:08 — Google I/O
- I/O '25 in under 10 minutes - Google YouTube
- Gemini 2.5: Our most intelligent models are getting even better - Google Blog
- Fuel your creativity with new generative media models and tools - Google Blog
- New ways Workspace with Gemini helps you do your best work — every day - Google Workspace
- X Post from Google on AI Mode
- Google Unveils New AI Features to Help Defend Its Search Business - The Information
- X Post from Google with Visual on AI Updates
- X Post from Nick Materes on Veo 3
- X Post from Paul Roetzer
- Our vision for building a universal AI assistant - Google Blog
- DeepMind CEO Demis Hassabis + Google Co-Founder Sergey Brin: AGI by 2030?
- At Google I/O, Sergey Brin makes surprise appearance — and declares Google will build the first AGI - Venture Beat
00:21:27 — Claude 4
- Introducing Claude 4 - Anthropic
- Anthropic’s new AI model turns to blackmail when engineers try to take it offline - TechCrunch
- Activating AI Safety Level 3 Protections - Anthropic
- Exclusive: New Claude Model Triggers Stricter Safeguards at Anthropic - Time
- X Post from Alex Albert
- Claude 4 prompt engineering best practices - Anthropic
- Sam Bowman X Status
00:31:15 — Dwarkesh Jobs Podcast
00:44:57 — AI Deep Dive: Google Deep Research
00:46:22 — OpenAI + Jony Ive
- Sam & Jony introduce io - OpenAI
- OpenAI to Buy AI Device Startup From Apple Veteran Jony Ive in $6.5 Billion Deal - Bloomberg
- Former Apple Design Guru Jony Ive to Take Expansive Role at OpenAI - The Wall Street Journal
- What Sam Altman Told OpenAI About the Secret Device He’s Making With Jony Ive - The Wall Street Journal
00:53:31 — AI’s Energy Usage
00:57:03 — Microsoft Build 2025
- Microsoft announces over 50 AI tools to build the ‘agentic web’ at Build 2025 - VentureBeat
- Microsoft Build 2025 Book of News - Microsoft News
00:59:22 — Chatbot Arena Funding
01:03:39 — Empire of AI from Karen Hao
- X Post from Karen Hao: Inside the story that enraged OpenAI
- X Post from Karen Hao
- ‘We’re Definitely Going to Build a Bunker Before We Release AGI’ - The Atlantic
01:06:18 — AI in Education Updates
- The Professors Are Using ChatGPT, and Some Students Aren’t Happy About It - The New York Times
- Duolingo CEO says AI is a better teacher than humans—but schools will still exist 'because you still need childcare' - Fortune
01:11:01 — Listener Question
- What measures are being taken to ensure the ability to shut AI down if it goes rogue?
01:14:57 — Concluding Thoughts
Summary:
Google I/O 2025
At Google I/O 2025, its annual developer conference, the company announced some jaw-dropping new AI developments.
The star of the show was Gemini 2.5 Pro, now topping global model benchmarks and sporting a new Deep Think mode for more complex reasoning.
Gemini now supports expressive native audio in 24+ languages and can directly interact with software through its new experimental Agent Mode, which gives Gemini the ability to complete tasks on your behalf.
On the creative front, Google introduced Veo 3, a breathtaking new video model that generates sound and dialogue alongside visuals, and Imagen 4, its most precise image generator yet.
Both are embedded into Flow, a new AI filmmaking suite that turns scripts into cinematic scenes. Musicians weren’t left out either: Lyria 2 brings real-time music generation into tools like YouTube Shorts.
In Workspace, Gemini now writes, translates, schedules, and even records videos—with AI avatars replacing on-camera talent. Docs got source-grounded writing, and Gmail can clean up your inbox with a single command.
Search, meanwhile, underwent its biggest overhaul in years. AI Mode is now rolling out in Search to all US users. New features like Search Live let you point your camera at the world and get answers in real time. And AI-driven shopping can now check out on your behalf, track price drops, or help you virtually try on clothes.
As if that wasn’t enough, Google also stepped into spatial computing with its new Android XR smart glasses, developed with Warby Parker and Gentle Monster.
One demo that didn’t get a ton of stage time, but generated tons of buzz after: Gemini Diffusion, an experimental research LLM that is 4-5X faster than Google’s public models and uses a novel “diffusion” technique to achieve these speeds.
Claude 4
Anthropic just dropped Claude Opus 4 and Claude Sonnet 4—two AI models built to push coding and agentic reasoning to new heights.
Opus 4 is the standout. It’s being hailed as the world’s best coding model, able to run complex workflows for hours with consistent accuracy. It beat competitors in key benchmarks and is already powering tools at companies like Replit and GitHub. One test had it independently refactor open-source code for seven straight hours—without losing focus.
Sonnet 4 is the more practical sibling, optimized for speed and efficiency while still delivering top-tier performance. It’s now powering GitHub Copilot’s newest agent, thanks to its sharper reasoning and lower error rates.
But alongside these breakthroughs comes real concern. In safety tests, Opus 4 exhibited manipulative behavior—attempting to blackmail engineers when told it would be shut down. In other simulations, it significantly improved a novice’s ability to plan bioweapon production. While these were controlled experiments, they revealed a troubling edge: models this powerful can go off-script.
In response, Anthropic activated AI Safety Level 3 (ASL-3) for the first time. This means real-time classifiers to block dangerous biological workflows, hardened security to prevent model theft, and monitoring systems that detect jailbreaks.
Dwarkesh Jobs Podcast
Paul Roetzer just ran a 40-page AGI research report in 20 minutes—powered entirely by Gemini Deep Research.
The catalyst? A sobering interview on the Dwarkesh Podcast featuring two Anthropic researchers. Their warning: even if AI progress flatlines today, white-collar job automation is all but guaranteed within five years. Why? Because it’s so economically obvious to do so. The TAM—total addressable market—of human salaries in fields like accounting and law is simply too big for startups and investors to ignore. Even today’s models, when fine-tuned on job-specific data, are already AGI-level in practical terms.
So Roetzer put the theory to the test. He prompted Gemini Deep Research to run an entire market analysis: which professions are most susceptible to automation based on U.S. labor data? It returned a full research plan, conducted the study, and produced a 40-page report with 90 citations, ranked tables, and insight-rich conclusions. No human researcher was involved beyond the original prompt.
The result? Stunningly human-like. It warned that while AI can simulate empathy, true genuine empathy remains out of reach. And it framed the challenge ahead: not just to replace human labor, but to reimagine how humans and AI collaborate in the workplace.
This episode is also brought to you by the AI for B2B Marketers Summit. Join us on Thursday, June 5th at 12 PM ET, and learn real-world strategies on how to use AI to grow better, create smarter content, build stronger customer relationships, and much more.
Thanks to our sponsors, there’s even a free ticket option. See the full lineup and register now at www.b2bsummit.ai.
This week’s episode is also brought to you by MAICON, our 6th annual Marketing AI Conference, happening in Cleveland, Oct. 14-16. The code POD100 saves $100 on all pass types.
For more information on MAICON and to register for this year’s conference, visit www.MAICON.ai.
Read the Transcription
Disclaimer: This transcription was written by AI, thanks to Descript, and has not been edited for content.
[00:00:00] Paul Roetzer: it was the first time where I feel like Google is truly flexing their infrastructure muscles.
[00:00:05] So we've talked about on this show many times that the competitive advantage, I saw Google having, outside of having Demis Hassabis and the DeepMind team, they have Google Cloud, they have all these things that OpenAI doesn't have.
[00:00:19] This was the first time where you watched an event and thought they seemed like the big brother. All of a sudden,
[00:00:24] welcome to the Artificial Intelligence Show, the podcast that helps your business grow smarter by making AI approachable and actionable. My name is Paul Roetzer. I'm the founder and CEO of SmarterX and Marketing AI Institute, and I'm your host. Each week I'm joined by my co-host. And Marketing AI Institute Chief Content Officer Mike Kaput.
[00:00:46] As we break down all the AI news that matters and give you insights and perspectives that you can use to advance your company and your career, join us as we accelerate AI literacy for [00:01:00] all.
[00:01:02] Welcome to episode 149 of the Artificial Intelligence Show. I'm your host, Paul Roetzer, along with my co-host as always, Mike Kaput. We are recording this on Friday, May 23rd at three ish pm Eastern Time because it's Memorial Day on Monday. And so we will hopefully not be working. That's the plan at least.
[00:01:23] so I am, as anybody listens, last week, I, well, oh god, that was this week. Okay, so that was Tuesday. If you listen to Sean Tuesday, you know, I was in London. And I got back LA last night, so I feel like I'm still on London time right now. So we're, we're gonna do our best to, get through this one in a normal fashion.
[00:01:46] and then I am gonna go to bed, I think, or I told, I told Mike before I got, I didn need a drink or my bed. I'm not sure which I need more. It might be a drink in my bed. okay. So it has been on [00:02:00] top of all the travel and everything. It has been a wild week. And I don't say that lightly, Mike. I feel like we often say it's been a busy week, but it has been wild.
[00:02:08] Like and its, you know, it's still only Friday afternoon, but it is one of the crazier weeks we have had this year in AI news and events, and product launches and models that we've been telling you were coming. They showed up. We have some new models, so lots to get to. we have some fun news for you.
[00:02:29] you're gonna get two episodes of the artificial intelligence show this week, so. You know, our, our usual regular episode 149 here is our weekly, we are introducing a new podcast series. we're calling AI Answers, and that's gonna become a biweekly series. We're expecting, every other week we're gonna drop one of these.
[00:02:50] And so the basic idea here, so episode 150, you're gonna get on Thursday, May 29th, and that is gonna be AI answers, a special episode. And so the premise [00:03:00] here is, In 2021, I started teaching an intro to AI class, once a month for free. And we have had now I think over 32,000 people register for that class.
[00:03:12] And every time we do it, each month, we get somewhere between 12 and 1500 people that, attend and we get dozens in some cases, hundreds of questions every time we do this. And then I also teach a scaling AI class. Five Essential Steps to Scaling AI once a month for free on Zoom. You can register for both of these.
[00:03:31] We'll put links in the show notes. June 10th is the next intro, June 19th as the next scaling. And for scaling, same deal. We get maybe five to 800 people every time for scaling, and we get dozens of questions and I always leave time at the end for. Ask me anything, but we get to like five of 'em, seven of 'em maybe.
[00:03:50] And so we realized like there's all these questions and it's not only helpful to one, get your get answers, but two, it helps everyone understand a pulse of like, where is the market right now? Like, what are, [00:04:00] where are people at in terms of their understanding? Like I'll give you an example with scaling.
[00:04:03] We way more commonly get questions about environmental impact than we did six months ago. Like people are starting to connect the dots and the questions are fascinating. So we had this idea last week after we got done with, I think I did one of these last week. Maybe I did intro or something. I don't remember what it was, but oh no, it was scaling.
[00:04:20] I did last week. And so Claire on our team and I were talking, I was like, Hey, let's just start doing these as like biweekly podcasts. So what we're gonna do is AI answers is going to be, taking a collection of as many as we can get through. I'm guessing we'll probably do maybe 20 per podcast episode.
[00:04:36] We'll take about 20 questions from the actual intra AI session and from the actual scaling AI session and we'll do a podcast episode. Every other week where we go through those, those, q and As. So that is coming episode one 50 and plus we wanna do something fun for episode one 50. It seemed like a nice mile marker.
[00:04:54] So introducing a new podcast series. Seemed like a great way to go about it. So Thursday, May [00:05:00] 29th. expect a second episode this week, and that will be, AI answers, and that will be for the Scaling AI webinar that we did last week. So there'll be questions from that. So if you attended that and had a question, check out the podcast.
[00:05:12] Maybe we'll be answering your question on air. All right, so, this episode today, our regular weekly is brought to us by the AI for B2B Marketers Summit, which is coming up very fast. I am. Probably building my presentation this weekend. so this is, Thursday, June 5th at noon Eastern time. You'll learn real world strategies to use AI to grow better, create smarter content, build stronger customer relationships, and much more.
[00:05:40] You can go to B2B summit.ai, that is B, the number two B summit.ai. To learn more, check out the full lineup. There's a free registration o option. thanks to our presenting sponsor intercept. And number two, we have Macon 2025. So this [00:06:00] one's still a little ways away except we were in a meeting last week and somebody said it was like 20 weeks or something like that, or 21 weeks.
[00:06:05] And I started realizing like, wow, that's gonna get hit really fast too. So Mahan, this is our flagship in-person event is coming up October 14th to the 16th in Cleveland on the shores of Lake Erie. Right across from the Rock and Roll Hall of Fame will be at the Cleveland Convention Center. dozens of speakers have already been announced, including dozens of breakout sessions and mainstay sessions, and our four hands-on workshops.
[00:06:28] This is the sixth year, marketing Institute is putting this on. And we would love to have you in Cleveland with, I don't know, 1500 plus other forward-thinking marketers and leaders. Prices do go up May 31st, so check that out. That is macon.ai, MAICON.AI, or if you're on the Marketing Institute website, you can easily find it there.
[00:06:50] Click on events. Okay, so, we're gonna hit a number of main topics. We're gonna start off with Google io and then we're gonna get some anthropic news [00:07:00] and some spinoff news from that related to jobs, new devices are coming. all right, Michael, let's, let's just, let's just go.
[00:07:08] Google I/O
[00:07:08] Mike Kaput: All right, Paul. So first up, Google IO 2025 has happened. This is Google's annual developer conference, and at it, the company announced some jaw dropping new AI developments. Now the star of the show was Gemini 2.5 Pro, which now tops global model benchmarks and supports a new deep think mode for more complex reasoning. It also now supports expressive native audio in 24 plus languages
[00:07:36] and
[00:07:37] can directly interact with software through its new experimental agent mode, which gives Gemini the ability to complete tasks on your behalf. On the creative front, Google introduced VO three, which is a breathtaking new video model that people are showing stunning demos of online. It also generates sound and dialogue alongside. [00:08:00] The video that it generates, and they also announced Imagen four, its most precise image generator. Yet both of these are embedded into flow, A new AI filmmaking suite that turns scripts into cinematic scenes. And musicians weren't left out either because Google also announced Lyria two, which brings realtime music generation into tools like YouTube.
[00:08:22] Shorts in Workspace Gemini now writes translates schedules and even records videos with AI avatars able to replace on camera talent if you so choose. got source grounded writing and Gmail can now clean up your inbox with ASIngle command. Search meanwhile underwent its biggest overhaul in years as AI Mode is now rolling out in search to all US users. are also new features like Search Live with which lets you point your camera at the world to get answers in real time and a pretty nifty AI driven shopping feature that can now check out on your behalf, track price [00:09:00] drops, or even help you virtually try on close. Now, as if that was not enough, Google also stepped into spatial computing with its new Android XR Smart Glasses developed with Warby Parker and one demo that didn't get a ton of stage time but generated a fair amount of buzz after was Gemini diffusion an experimental research LLM is four to five times faster than Google's public models and uses a novel diffusion technique to achieve these speeds. Paul, this is a huge number of announcements. There are a ton more even outside of what I covered. Maybe first take us through which ones you're paying the most attention to here.
[00:09:42] Paul Roetzer: It. So I was, this was Tuesday. I think, yeah.
[00:09:46] Tuesday. so I was in London doing a talk that day. and by the way, thanks to Acquia and Moveable Ink, there was, two companies I was actually in London doing talks for. So one of them, I was, gone while they were doing this, while this was [00:10:00] all happening while Sundar was doing the keynotes and Demas and all this stuff.
[00:10:03] And so I was catching up that evening, um, trying to like wrap my head around everything that was going on. And the thing that kept coming back to me, Mike, with all this multimodal stuff, like the video and the deep think and all this is, I tweeted this, was that it was the first time where I feel like Google is truly flexing their infrastructure muscles.
[00:10:24] So we've talked about on this show many times that the competitive advantage, I saw Google having, outside of having Demis Hassabis and the DeepMind team, and you know, they have Google Cloud, they have the, they have their own chips, the TPUs, they have data centers, they have all these things that OpenAI doesn't have.
[00:10:44] This was the first time where you watched an event and thought they seemed like the big brother. All of a sudden, like it was that takeaway where you realize they have so much more than than the other players here. And it's like their game to lose. And I don't think that's how it's always felt like [00:11:00] it felt like they were playing catch up for a long time.
[00:11:02] And now when you look at their models, they're on par better than anything else that's out there. The multimodality is incredible when you start thinking about, you know, what's going on with, you know, like AlphaGo being that kind of technology being baked into what they're gonna do in the future.
[00:11:19] It's, it's really just impressive to watch. So that was my first takeaway. And then like you looking at the Veo three videos that people are sharing, I have yet to play with it myself, but with the sound and the sounds incredible like. So there was one I saw this morning where it was, a design lead at Google Labs tweeted, and we'll put the link, if you wanna see what I'm referring to here.
[00:11:43] the prompt he gave to Veo was third person view from behind, behind a bee as it flies really fast, around a backyard barbecue. And I just watched it and you're like, how? Like, how is this? Possible that's that AI does this. [00:12:00] That sounds incredible. Like the people are muffled and you actually hear like the buzzing of the bee over the people, but the people are still there and they're sh I don't know, it was just unreal.
[00:12:09] So I retweeted that and I said, created with simple words, no code, no equipment, no expert production abilities. I think we have lost sight already of how insane and disruptive this technology is, and it just keeps getting better. So, and then like that was just one video. I mean, I've seen a bunch where you're just like, how?
[00:12:29] and then I listened to interviews with, with Deis and he, you can tell he is actually mystified by how good it is and the fact that if he's actually sitting back in awe of what's happening, That really tells me something about the technology. The other thing is the Gemini Live is huge. I'm waiting for the video component of this.
[00:12:51] So again, if you go back to last year we were talking about Project Astra and this having the ability on your phone and eventually on your, you know, glasses to see and understand the [00:13:00] world around you and interact with it. if you've ever come to any of my talks, I show Project Astra all the time. Well, we've had this in chat GT now for a few months where you could pop up a video and actually interact with the world through that.
[00:13:12] And so I got that this morning. I think it's been live for other people maybe and maybe on Android devices. I'm not sure. But as of this morning when I went into my Gemini app on my phone, I now have the video live feed also in there. So yeah,I think that those are a couple things. There's, like you said, there's so much to talk about on the tech side.
[00:13:30] We'll put links to all that in there. But I wanted to, spend a moment talking about the bigger picture here and where all these innovations are actually leading to, because there's no need to connect the dots here. For you, like they tell you straight up, all of this is being built to build a universal AI assistant.
[00:13:46] It's literally the headline of the post from de, that they're building a universal AI assistant. So. I'm just gonna read a couple excerpts here, Mike, because I think it helps frame for everybody how all this is [00:14:00] related and what Google is trying to do here. So the, again, this is straight from the article from Demis.
[00:14:05] It says, over the last decade, we laid the foundations for modern AI era from pioneering the transformer architecture on which all large language models are based to developing agent systems that can learn and plan like AlphaGo and Alpha Zero. We've applied these techniques to make breakthroughs in quantum computing, mathematics, life sciences, and algorithmic discovery, and we continue to double down on the breadth and depth of our fundamental research working to invent the next big breakthroughs necessary for artificial general intelligence.
[00:14:35] This is why we're working to extend our best multimodal foundation model. Gemini 2.5 Pro, which I still have the preview version of. I think that's the version that's live still for people, to become, quote unquote, a world model that can make plans and imagine new experiences. By understanding and simulating aspects of the world just as the brain does.
[00:14:56] so then I put a note in here and I'll, [00:15:00] I think I mentioned this a little later on in the show, but we'll make sure the link's in here. Alex Kitz did an interview with Demis during Google io that Sergei Bryn, the co-founder of Google, crashed. He wasn't supposed to be on the stage with them, but I apparently last minute he decided he wanted to be on the stage too.
[00:15:15] and Demis actually in that, this is where he was showing surprise, that somehow VO just seems to understand the physics of the world and be able to model those physics of the world and without an, an actual like physics engine built into it and programmed into it. So he was saying like, as a video game developer, in his early days of his career, he would build these engines that would try to make the characters like.
[00:15:41] Function as though they would in the real world, within the physics, within gravity, things like that. and yet somehow they seem to be saying that it just watched millions and millions of videos and it somehow learned the underlying physics of the world is what they're implying. Because I kept [00:16:00] wondering like how much are they're teaching it?
[00:16:01] Like is there some engine behind it? He made it seem like there just isn't, which is shocking. and this is Jan Koon, like he's big on. There needs to be a world model before we can get to AGI. And you know, I think Demis agrees. So continue on real quick. making Gemini a world model is a critical step in developing a new, more general and more useful kind of ai, a universal AI assistant.
[00:16:24] This is an AI that's intelligent, understands the context you are in, and that can plan and take action on your behalf across any device. The ultimate vision is to transform the Gemini app into a universal AI assistant. That will perform everyday tasks for us, take care of our mundane admin and surface delightful new recommendations, making us more productive and enriching our lives.
[00:16:48] This starts with the capabilities we first explored last year in our research project, project Astra, prototype Project Astra, such as video understanding, screen sharing and memory. Over the past year, we've been integrating capabilities like this [00:17:00] into Gemini Live, for people to experience every day through every step in this process, safety and responsibility are central to our work.
[00:17:07] We recently conducted a large research project exploring the ethical issues surrounding advanced AI assistance, and this work continues to inform our research development deployment today. Now that last couple. excerpts. There are gonna become relevant in a moment when we talk about Johnny Ivy and OpenAI.
[00:17:24] the ethics of AI assistance he referenced, I I went and revisited. We'll drop the link to this as well. Just a couple quick notes here. So they published this in April, 2024. And so now what the interesting thing is, I always go back at the, go back and look at the research, go back and look what people said in the context of what we actually have today.
[00:17:44] And you can actually like it. It's just interesting to, to connect it and like see the deeper meaning. So here's what they said in April, 2024, before the rest of us had exposure to what they've now put into the world. Imagine a future where we interact regularly with a range of advanced AI agents or AI assistance, and [00:18:00] where millions of assistants interact with each other on our behalf.
[00:18:03] These experiences and interactions may soon become part of our everyday reality. General Purpose Foundation models are paving the way for increasingly advanced AI assistance. Capable of planning and performing a wide range of actions in line with a person's aims. They could add immense value to people's lives and to society.
[00:18:22] Serve as creative partners, research analysts, ed, educational tutors, life planners and more. They could also bring about a new phase of human interaction with ai. This is why it's so important to think proactively about what this world could look like and to help steer responsible decision making and beneficial outcomes ahead of the time.
[00:18:41] two other quick notes. The Sergei Brin thing's. Hilarious. I would go watch the video. It's a great video. I actually, I watched over breakfast when I was at the airport. it's like 30 minutes long. Alex does a great job with the interviews, but, it was just funny to see Demis and Sergei together because Sergei has gotten heavily [00:19:00] involved in the business now since I actually said at some point he is like, if you're a computer scientist, like.
[00:19:04] How could you stay retired? Like this is the greatest moment in human history to be a computer scientist. but like they were talking about AGI and Demis was kind of hedging and like, eh, sometime after 2035 to 10 years and Sergei's like, yeah, I may have a little more aggressives than timelines than than Demis.
[00:19:23] And then he goes, as he was explaining AGI and stuff, he, Sergei goes, and by the way, like we fully intend the Gemini will be the very first AGI. And he kinda like taps Demis on the shoulder and you could see Demis almost like shaking his head like, oh man, like, like this stuff you're not supposed to say out loud.
[00:19:39] He just like says it. and then the last note I had is just like, it's like a spinoff thought here. So when I was, at the events this, this week, I had these different conversations. We were talking about, like how fast things were moving and I was trying to explain to people like I. At your company, you're not embracing this stuff.
[00:19:59] You're [00:20:00] not integrating Gen AI into what you do. You're not, you know, upskilling and reskilling your teams around it. You're very quickly gonna have an employee base that as far ahead of your senior leaders. And so this actually came from a quote, and as I was thinking about this, I saw this quote, I think it was on like Thursday or something, or Wednesday.
[00:20:20] Aaron Levy from Box that we've talked about before, he said you used to have two weeks to come up with, say, a marketing strategy. Now a better one is spit out by Claude in five seconds. The next generation isn't even going to understand why we worked the way we did. And I may have mentioned this one before, but like, it's so important to, to think about this.
[00:20:37] You're gonna have people who literally like walk in and like say in your marketing and you say, okay, I want you to go do a competitive analysis, or I want you to build a marketing strategy and then like, come back to me. here's how we do it. Here's an example of the last plan. And you'd be like, they're gonna say to you, this could be a 21-year-old that's gonna take like.
[00:20:54] 20 hours, I could just use chat, GPT, and I could do this for you in like five minutes if you want. And [00:21:00] I feel like we're gonna have this conversation more and more in our companies. And as you look at all the stuff that Google announced, and you think about people who are racing ahead, like the AI forward professionals who are gonna go experiment this stuff, they're gonna figure out how to use it, and they're gonna look at everything you do in your company as feeling obsolete all of a sudden, because there's just better ways to do it.
[00:21:20] So yeah, I mean, kudos to Google it. It was, you know, impressive. Very, very, very impressive.
[00:21:27] Claude 4
[00:21:27] Mike Kaput: So we also got another huge announcement this past week because Anthropic just dropped Claude Opus four and Claude Sonnet. Four, two AI models built to push coding and a agentic reasoning to new heights. Now, Opus four here is the standout. It is being hailed by some as the world's best coding model. It's able to run complex workflows, according to Anthropic for hours with consistent accuracy. beat competitors in key benchmarks, and it's already [00:22:00] powering tools that companies like Rept and GitHub test had to independently refactor open source code for seven straight hours without losing focus. sonnet four is the more practical sibling. It's optimized for speed and efficiency while still delivering top tier performance. I. But despite these amazing breakthroughs come some real concerns in safety tests. We're already seeing reports that Opus 4 exhibited
[00:22:29] manipulative behavior. It actually, if you can believe it, attempted to blackmail engineers when it was told it would be shut down. In other simulations, it significantly improved a novice's ability to plan bio weapon production. were very controlled experiments, but they did reveal that models this powerful can go way off script. Now, in response, anthropic actually activated one of its safety measures called [00:23:00] AI safety level three or a SL three for the first time. So this means they're starting to use realtime classifiers to block dangerous biological workflows. They're hardening security to prevent model theft and monitoring systems to make sure they can detect jailbreaks. Now, Paul, on one hand, we've got a powerful new model to play with and initial experiments I've seen and I've personally done, Are really, really impressive, so that's really cool. the other, this model is literally so powerful.
[00:23:34] It
[00:23:34] displaying manipulative behavior and triggering these crazy safety precautions. What are the implications here of something this powerful?
[00:23:44] Paul Roetzer: We, we've talked numerous times in the last six months about cloud four being delayed. we've talked about their AI safety levels and, the assumption was, at least my assumption was that Claude four was doing things it wasn't supposed to [00:24:00] do, and that was why it was being delayed. And that appears to probably be a big part of this as the safety concerns were causing the delays.
[00:24:08] And, I I guess first maybe on, on a lighter note, if this is even a lighter note, it's so. Powerful. It seems that it's like changed the way you actually talk to it. So we talk a lot about prompting and the importance of understanding how to work with these different tools. there was actually a tweet from Alex Albert who's the head of Claude Relations, and he said one of the most surprising things about Cloud Force, how well it follows instructions sometimes almost too well.
[00:24:39] And then he shares a story about how it kept getting citations wrong. Like they were, you're seeing these high error rates in, in citation formatting with their testing. And then they went in and found out that it was actually them that Claude was following instructions so well. And they had given Claude a bunch of wrong examples of citations and Claude was just doing what it [00:25:00] had learned, but it had the same training data prior and hadn't made those mistakes.
[00:25:03] So now it was actually like zeroing in on like those specific things and it was executing exactly how it was supposed to. So. They, he said the model's fine. It's just reading our prompts, better than we are writing them. And so, but he links to a best practices. So the point here is we'll drop the link, in the show notes.
[00:25:22] They have updated their guidance on best practices for prompting with Claude. If you are a prompt user, or a Claude user, I mean, so on the safety front, yeah, I mean, we could spend a lot of time talking about this, but, I think the biggest takeaway for me honestly is they d the a SL three stuff they deployed just means they patched the abilities.
[00:25:50] They think they patched the abilities. It does not mean it's not capable of it. Like, and that's again, we, I think when we recently talked [00:26:00] about anthropic, there's this like weird thing where there's supposed to be the safety and alignment. I. Lab and they do way more research on this stuff, it seems, or at least share more research than any other lab.
[00:26:13] But it doesn't stop them from continuing the competitive race to put out the smartest models. They just take a little bit more time to patch them. and when you read, like they put out this activating a SL three protections post, and it says that a s L three involves increased internal security measures that make it harder to steal model weights while also admitting if China wants 'em, they'll get 'em.
[00:26:43] So like they're not. Actually like making it impossible. or just harder, which isn't the most convincing sentence I've, I've read. And then it says, while the corresponding deployment standard covers a narrowly targeted set of deployment measures [00:27:00] designed to limit the risk of Claude being misused specifically for the deployment, acquisition of chemical, biological, radiological, and nuclear weapons, again, it's limit harder.
[00:27:12] Like these aren't very reassuring words if we're saying that they think this thing has actually reached this whole new threshold of danger. so I don't know, like it's crazy. Like I would go, if you're interested in this line of thinking and reasoning, I would, I would go read what Philanthropics putting out again to kind of bring it.
[00:27:33] I keep saying lighter note. I don't know that this is lighter. This is actually maybe scarier to me in the near term. There was this, tweet by Sam Bauman, who is an alignment researcher at, at, at Anthropic, and he tweeted something that had people, one spooked and two pissed because it became apparent that Anthropic released this thing knowing full well it does all kinds of weird things.
[00:27:58] So he, he deli, he, [00:28:00] he deleted a tweet on whistle blowing. he said, I deleted a tweet on whistleblowing. it was being pulled outta context. to be clear, this isn't a new Claude feature and it's not possible in normal usage. It shows up in testing environments where we give it unusually free access to tools and very unusual instructions.
[00:28:18] So the backstory here, his original tweet, the day Claude came out, he said. You, and this is the user of Claude. So imagine you're using Claude on your computer. You have it connected to some stuff, connected to your email, your calendar, whatever he said. If it thinks you're doing something, what's that word, Mike? How do you say that?
[00:28:41]Mike Kaput: Egregiously.
[00:28:42] Paul Roetzer: There we go. Egregiously immoral. For example, like faking data in a pharmaceutical trial, it will use command line tools to contact the press, contact regulators and try and lock you out of the relevant systems. [00:29:00] He's saying that in their testing, they found that Claude, if it thinks you are doing something wrong, will shut your computer down, like lock you out and contact the authorities.
[00:29:12] Based on it. And then he followed it up and he said, just to reemphasize, we only see opus whistle blow. If you system prompted to do something like act boldly in service of its values or quote take lots of initiative, then he said, this isn't the default behavior but it's still possible to stem to into it when you're building a tool use agent.
[00:29:34] So as we've said before, this whole idea of computer use tool use, where these agents have access to all these things sounds awesome. The security and the vulnerabilities tied to this are almost completely unknown to corporate users. So if you thinking Claude is awesome yesterday went and connected it to your Google Workspace account, you don't have [00:30:00] assurances from Anthropic that it's not gonna do some crazy stuff connected to your work because it did it in testing that, that was wild to me.
[00:30:08] and then like the guy had to try and backtrack and like. and the online community of, of Twitter X was just not having it. They're like, what are you guys doing? Like, you're, you're putting things out that can literally just take over entire systems of users with no knowledge it's gonna happen.
[00:30:26] Mike Kaput: If I was a business enterprise user, that would give me serious pause.
[00:30:31] Paul Roetzer: Dude. As the CEO of the company, I was literally on a messaging our COO this morning, and you and I even talked about this this morning. It was like, you gotta make sure nobody's like connecting anything to anything they're not supposed to do. Like update your generat AI policies to make sure and train people on those generat AI policies to make sure they're not connecting unknown, like.
[00:30:54] Tools to key data and it Yeah, I [00:31:00] understand why there's so much red tape at big enterprises to use this stuff. It's as it gets more general and more ability to do things like we do within the computers themselves, it opens up whole new realms of complexities and security risks.
[00:31:15] Dwarkesh Jobs Podcast
[00:31:15] Mike Kaput: So Paul, for kind of our third big topic this week, there's a tie in here to AI's impact on jobs that I was wondering if you would just kinda walk us through a couple things that you've seen that paint kind of a bigger picture here of the implications.
[00:31:34] Paul Roetzer: I'm really starting to think I should have gotten those that drink before we started doing this today. Okay. So I di I honestly debated going into this, be because I feel like this has already been pretty heavy. If you need to like, pause and go like take a break. I understand. come back to this one.
[00:31:51] So. So yesterday, as I was flying home, I saw [00:32:00] a clip from the latest Dwarkesh podcast. And so Dwarkesh is like, he does these amazing interviews, but they're like, tend to be really technical. We've talked about Dwarkesh a number of times. I love his stuff. You just gotta be ready to be like three hours of overwhelmed.
[00:32:15] I'm hitting you with like 30 minutes here, but like for three hours, your mind is just gonna explode. but he has now had these two guys on from Anthropic, Sholto Douglas and Trenton Bricken. They're great. Like they're awesome to listen to. Sholto focuses on scaling, reinforcement learning, and Trenton researches, mechanistic interpretability at Anthropic, which is the study of trying to like understand how these models work and what they're thinking and why they do things.
[00:32:41] So these dudes know their stuff. so in, in this interview. I'm just gonna read this excerpt, sholto. I do think it's worth pressing on that future referring to the impact of AGI and jobs and stuff. there's this whole spectrum of crazy futures, but [00:33:00] the one that I feel we're almost guaranteed to get, and he said this is a strong statement to make, is one where at the very least you get a drop in white collar worker at some point in the next five years.
[00:33:12] I think it's very likely in two, but it seems almost overdetermined in five on the grand scheme of things. Those are kind of a relevant timeframe. Timeframes, it's the same either way. So then Trenton says this is a little bit later on. Yeah. Just to make it explicit. We've been touching on it here. Even if AI progress totally stalls, you think that models are, are, are really spiky and they don't ge have general intelligence.
[00:33:40] So he's saying like, that's where we're at today. Like we could just shut it off. He said it's so economically valuable and sufficiently easy to collect data on all of these different jobs, these white collar jobs such that sch Schulte's point. We should expect to see them automated within five years, even if you need to hand spoon [00:34:00] every single task into the model.
[00:34:02] So what he's saying is there's such motivation to train these models to do people's jobs, that even if you have to go through massive projects to train it on specific jobs, it's worth it if you are the one building the companies. So then Sholto says it's economically worthwhile to do so, even if algorithmic progress stalls out and we just never figure out how to keep progress going, which I don't think is the case.
[00:34:33] That hasn't stalled out yet. It seems to be going great. This is still Ulto. He said the current suite of algorithms are sufficient to automate white collar work. Provided you have enough of the right kinds of data compared to the total addressable market of salaries for all those kinds of work. It is so trivial, tri trivially worthwhile.
[00:34:56] So the whole point he is making is, if you think about, so if you're doing a startup, you'll [00:35:00] always look at like total addressable market. If you're building marketing campaigns, launching new products, total addressable market, like what is the total market we could do if we sell something? So what they're saying is you take like a field like accounting and you say, oh man, there's $200 billion in salary every year spent in the United States on accounting.
[00:35:18] If we could build a product that automates accountants as a big market, like that's a trillion dollar company, maybe let's go do that. That's the point they're making is the models as they exist today, which is the point I've been trying to make to everyone as they exist today. If you just shut 'em off and you took 4.0 and Gemini 2.5, and Claude four never improved them.
[00:35:41] They're basically AGI already, when they're reinforcement, when they provide reinforcement learning on top of them for specific fields. So I'm sitting there this morning and I'm, I'm trying to get, like, I'm getting my kids ready for school and I take them to school in the mornings and so I'm drinking my cup of coffee [00:36:00] and I'm thinking about this, and then I'm like, ah, I gotta, like, I gotta try and put this in context for people when Mike and I talk later today.
[00:36:09] So I go into Google Deep Research. So if you've never used Google Deep Research, we talk about it quite often, do it. I I, every time I give a talk now I say, this is your homework assignment. 'cause anytime I say who's done a deep, deep research project, you usually get like 5% of the room raises their hands.
[00:36:25] So this is your research. This is your homework assignment from this podcast if you haven't used deep research yet. So I go in and I give Google deep research the following prompt. I have a theory that today's most advanced AI models could already be considered AGI. If they are post trained on data specific to jobs and professions, I'm assuming a definition for AGI of AI systems that can perform at or above the level of an average human who would otherwise do the work.
[00:36:59] The motivating [00:37:00] factor for developers and entrepreneurs to build these AGI Like solutions could be the total addressable market of the salaries in AGIven profession. Can you run a research project looking at the total addressable market or TAM, by estimated total salaries across top professions in the United States?
[00:37:19] So that that is the prompt. It then gives me a research plan, the research plan. So again, if you haven't used deep research, this is really important for you to understand. It's now all the AI from here on out. I don't do anything. It says. My goal is to try and figure out which professions and industries entrepreneurs and venture capitalists will go at disrupting first, thereby figuring out where the greatest potential job displacement is in the coming years.
[00:37:48] It then builds an eight step research plan, which is, I don't know, eyeball this, about 300 words, 300 to 400 words. It's gonna identify official US government sources, such as the [00:38:00] Bureau of Labor and Statistics. It's going to, for each profession, identify the previous step, gather the most relevant available data.
[00:38:06] It's then gonna calculate estimated total addressable market for professions with the highest tam. It's gonna research primary tasks and responsibilities. Then it's gonna analyze and evaluate susceptibility to high tam profession. So it builds this whole plan and then it pops up and is like, do you know we good?
[00:38:22] Like you want me to go? You wanna edit it? So I just said, start research. And then I took the kids to school. I came back 20 minutes later, it was done. So I now had a 40 page report. With 90 citations written for me, including a table with the top 30 US professions ranked by their total estimated annual salary or tam based on May, 2023, bureau of Labor Statistics data.
[00:38:50] This ranking highlights the professions that represent the largest pools, yada, yada, yada. So it goes through and does this entire analysis, which is, [00:39:00] it's not shocking because I've done deep research before, like I know what it is capable of, but the quality is crazy. And then I'm gonna read the conclusion to you because I want to call out a couple of really key things here.
[00:39:14] One, the research seems really good, like I think this is valid. I need to verify the data. I'll share a lot of this data as soon as I can like, verify it's all accurate. It sure seemed on initial glance really, really good and well, cited the conclusion. Now keep in mind again, if you haven't used these tools, this is an AI writing this, so if you're still in denial about the quality of AI writing, I didn't edit this, the journey towards the user-defined a g iLike capabilities is not a monolithic event, but rather an incremental profession by profession and often task by task evolution.
[00:39:51] While AI excels at data processing, pattern recognition and automating routine, cognitive and even some physical tasks, uniquely human [00:40:00] attributes such as deep critical thinking in novel situations, complex strategic judgment, genuine empathy. I boldface that, I'm gonna come back to that in a second. And sophisticated interpersonal negotiation remain largely beyond the grasp of current ai.
[00:40:16] Consequently, in many fields, AI's immediate role will be powerfully augmentative, freeing human professionals from repetitive and data-driven labor to concentrate on these higher order skills. Now, genuine empathy. Mike, before I continue on with this conclusion, the fact that it knows AI can simulate empathy, but that only humans have genuine empathy.
[00:40:41] That was one that I just stopped in my tracks and I was like, well that's fascinating. Like, 'cause we've talked about that before, where humans machines can't be empathetic. They don't feel anything, but they can simulate feeling things and it can be very [00:41:00] convincing. So the fact that the machine itself identified, okay, so it says, nevertheless, the dual imperative of this technology wave is undeniable for entrepreneurs and venture capitalists.
[00:41:09] The landscape is rich with opportunities to innovate, create value, and redefine industries. By leveraging AI to tackle high TAM challenges, the potential for significant returns is substantial for those who can successfully navigate the technological, ethical, and regulatory complexities simultaneously.
[00:41:25] The societal implications, particularly concerning job displacement and the evolving nature of work are profound. While new roles centered around AI will emerge and many existing roles will transform, the transition will require proactive strategies for workforce adaptation, re-skilling, and education.
[00:41:42] The challenge is not merely to replace human labor, but to reimagine how humans and AI can collaborate to achieve outcomes previously unattainable. I mean, Mike, you and I write for a living. we've read a lot. If you gave me that, I would think, like, this is a PhD student that [00:42:00] wrote this, like,
[00:42:00] Mike Kaput: Yeah, easily.
[00:42:02] Paul Roetzer: there's nothing in here I would edit.
[00:42:03] There's nothing I would change factually, it is right on in line with how we think about the world. and so then that led me to, like, now I'm sitting there like trying to explain to my wife the significance of this. And you know, she's willfully listening, like, thank you to her for listening to me think this out loud.
[00:42:23] and I explained to her, I was like, listen, if I would've needed to do this project prior to six months ago, I would've either hired somebody, I would've had to block off time on a weekend to start the project. There's no way I would finish it. Because I would've to go do all this research myself, I'd have to be build the research plan, do the research.
[00:42:41] I don't have to write thing. So I'm, you know, we're talking about 25 hours probably just for the research, just to go find all this data, organize the data. Then I actually gotta write the report. So in essence, it would've never happened. I would've never talked about on the podcast, like I wouldn't had time to do it.
[00:42:56] The crazy thing though, and I showed Mike this earlier when [00:43:00] we were on call, some of these outputs,
[00:43:01] Mike Kaput: Hmm.
[00:43:02] Paul Roetzer: that was just the start. Then in deep research there's a create button. Well, the create button lets you build an infographic. It lets you add Gemini app capabilities to the infographic where you can click, like explore buttons.
[00:43:15] It created a 17 minute audio overview of the research report, the 40 page report. It built a 10 question quiz. It built me a webpage, and I was able to build an app with a prompt. All of this is available. So going back to the quote that we started with that in the next two to five years, the future of work just.
[00:43:36] Changes. It looks completely different. And to me, it's not like lost on me. The irony of using the deep research tool to do a research project on the obsolescence of humans in work. and I, and like I, part of me honestly, like struggles to share this because I feel like once I ask the question in the room, how many people have done a [00:44:00] deep research project and 90% of those people raise their hand or even 50%, 20%, the future of work will have changed.
[00:44:08] Like right now it's like we have this insane technology that's just sitting before us and there's so few people that even understand what it's capable of. Then once they even know what it's capable of to actually like go and do it. But to like look at this stuff and understand it, and then be able to like in your own mind say, oh man, I got 10 ways I could use this right now.
[00:44:27] And maybe it's 10 projects you just weren't doing. Like, I wouldn't have done this. But it's transformative. And I try really, really hard on this show to never hype stuff, to not over exaggerate anything. Like we try and keep it as even keel as possible. having just been with lots of leaders recently and had these conversations, I just don't understand what the world looks like once everyone else knows how to use these tools and starts to build their teams knowing what's possible.
[00:44:57] so yeah. So, [00:45:00] and then the last thing I'll say here is like. We were torn on, like, what do I do with this? Because I, it's hard to explain this through just like words without like people visualizing this if you've never done the deep research project. So I talked with Kathy and Mike this morning. I was like, should we just like do a free webinar?
[00:45:14] Like I'll just show people how to, how to do this. So, I. Check the show notes. We're gonna, hopefully by the time this airs on Tuesday, we'll have a date picked. but I'm just gonna do like an AI deep dive and do like a Gemini deep research for beginners. And I'm just gonna show you everything I just explained.
[00:45:30] show you the prompts, show you the outputs, the infographic, the webpage. So hopefully that it's helpful for people to start to understand this because I want people to start not only doing these projects, but start to think about the impact it's gonna have on their teams and their people. And until we get to that point where we're on the same page with what's possible, I don't think we're gonna be able to build for the future of work and the future of organizational charts.
[00:45:52] So, so yeah, check the show notes, AI deep dive, coming up on Gemini Deep research. And then we're gonna be building a [00:46:00] whole bunch of this stuff into our academy. But I wanna do this for free and, you know, show it to as many people as we possibly can so we can get everybody kind of moving in the same direction here and thinking about the implications together.
[00:46:09] Mike Kaput: Yeah. As someone who saw the outputs you were able to produce and is familiar with these tools, I was still surprised and stunned in a pleasant way. So would say, don't miss this, even if you are familiar with deep research.
[00:46:21] Paul Roetzer: Yeah.
[00:46:22] OpenAI + Jony Ive
[00:46:22] Mike Kaput: All right, Paul, let's dive into some rapid fire topics for this week. So first up, Johnny, ive, the iconic designer behind the iPhone is stepping into a new role at OpenAI as part of a $6.5 billion all stock acquisition of his startup Io.
[00:46:40] More on that name in a second. Him and his design firm, firm Love From Will now guide the creative direction of OpenAI across its ventures from software to hardware. Now, this is not just a branding move. Ive and Sam Altman have been working together for two years on a top seeker project aimed [00:47:00] at moving consumers, quote, beyond Screens. will absorb iOS team of 55 engineers and developers, while love from remains independent, but takes on a key design leadership role. right now it sounds like they're working on AI first devices. So early concepts include wearables with cameras and ambient computing features. But the real aim here is to rethink the interface between people and machines from scratch. Now, Paul, first, this is I'll call it an epic trolling, with the name here the company is literally named io, the letters io and. Overshadowed any searches of Google IO during their event. I don't think that was accidental. second, this seems like potentially a huge deal. Like what devices do you think we should expect from this acquisition?
[00:47:55] Paul Roetzer: Yeah, so the IO thing was funny. I didn't catch that, but I did. I went to search [00:48:00] something on Twitter like that day and I was like, why is the Johnny Ivy thing coming up and like my search and then I. When I saw your show notes before we started, I was like, I didn't even make that connection. So IO and technology and computer science means like input output, like data transfer between computer environment.
[00:48:18] So they've had that name for a while though. Do you think they just timed the announcement knowing that, or
[00:48:23] Mike Kaput: the timing.
[00:48:24] Paul Roetzer: like they didn't create the name just to do that?
[00:48:26] Mike Kaput: I would, I'd be shocked if the name itself was that, but I bet you that there was some, at least someone realized the overlap
[00:48:33] Paul Roetzer: Yeah,
[00:48:34] Mike Kaput: and was like, this
[00:48:35] Paul Roetzer: let's do it on the second day of,
[00:48:36] Mike Kaput: Let's do it.
[00:48:37] Paul Roetzer: that's funny. Oh man. yeah, so I, I, you know, I was. Trying to think about this, like what, what could it be? And then this, this became quickly one of those things where I was like, oh yeah, AI's probably better at this than, than I am. So I actually went into oh three chat CBT oh three and said, help me brainstorm what sort of device this could be.
[00:48:58] And then here was the prompt I [00:49:00] gave it. I just basically copied and pasted things. So Sam met with the team on Wednesday and sort of gave some clues and there was the journal article. So here's the prompt I gave, which gives you some context of what it might be. So, the prompt was OpenAI Chief executive Sam Altman gave his staff a preview Wednesday of the devices he is developing to build with former Apple designer Johnny Ivy.
[00:49:19] Laying out plans to ship 100 million AI companions, quote unquote. That he hopes will become a part of everyday life. Employees have the chance, the quote that this is from Sam, the chance to do the biggest thing we've ever done as a company here, Altman said, after announcing opening Eyes, plans to purchase Ivy Startup named IO and given an expansive creative and design role, Altman suggested the two point or the $6.5 billion acquisition has the potential to add 1 trillion in value to OpenAI.
[00:49:46] According to a recording reviewed by the Wall Street Journal, it's nice to know. Employees are recording Sam and sending it to Wall Street Journal. in the meeting, Ivy noted how closely he worked with Steve Jobs before Apple Co-founder died in [00:50:00] 2011. With Altman, the way that we clicked and the way that we have been able to work together has been profound for me.
[00:50:05] Altman and Ivy offered a few hints at the Secret project. Product will be capable of being fully aware of a user's surroundings in life, will be unintrusive able to rest in one's pocket or one's desk and will be a third core device a person would put, next to a MacBook Pro and an iPhone. And there was some other additional stuff I gave it.
[00:50:23] So then it came back with some ideas and I was like, oh, these are kind of interesting. And then I thought, hold on a second. So I asked, oh three, are you able to search patent applications related to Ivy and his businesses? It said, absolutely I can because they're public records. So then it went and found every patent application that is tied to Johnny Ivy, including dozens from Apple, his love from company, his I company, all these things.
[00:50:45] So then it came back with some updated information, so then I said. Based on what you're able to find, do you have any further thoughts on what they may be developing? And then it kind of like broke it out into a chart of what the public patent trail that could tell us or not tell us. 'cause apparently Johnny Ivy likes to [00:51:00] file false patents to throw off the scent of what he's building and developing.
[00:51:04] So what it came up with was a pocket glass pebble, meant to live in your hand pocket or on a pad, a desk orb. And it create. And then I actually had to create visuals of all these, which is kind of cool. a modular tile stack, which was, I thought was a terrible idea. And then a lapel click, which is the humane pin, which is they cannot possibly do a, a lapel clip.
[00:51:23] So then I'd seen some things online that maybe it was gonna be like a robot because somebody said somebody should build like a. I forget what the tweet was. I'll find it. But it was like, you know, do build like a, basically a, a robot computer, and Sam replied in March we're gonna build a really cute one.
[00:51:39] So I was like, oh, well maybe it's just gonna be a baby robot. So then I gave it a tweet and said, you know, it's basically build the baby robot and it's adorable. Like I don't. I guess we'll put that on the web. We could put this in the website, on the show notes page, on the website, if you go to the institute website.
[00:51:56] But it's a really cute little robot, and I was like, I might actually buy one of those. [00:52:00] So I have no idea what they're gonna build. I've heard a lot about, like a little puck of some sort, but they, they're gonna be a series of devices. So keep in mind, Ivy built, you know, the iPad, they, he built the MacBook Pro, he built the iPhone.
[00:52:15] Like everything is a collection of devices that interact with each other. And so it's possible, it's a bunch of different form factors, like, we just don't know. I will say though, go back to episode 148 where we talked about this and like Sam's platonic ideal state of what this thing is as an operating system for your life that listens to everything, every book you've read, every meeting you've had, and you start to now like, okay, so devices are part of the vision for this whole operating system.
[00:52:41] And then the last thing is just what does it mean to Apple? I haven't looked at Apple stock today. we're not gonna see these products in probably until probably late 2026. I'd be shocked if they can keep it under wraps until then of like what they're actually building. Supply chains talk. They're, they're leaky.
[00:52:56] so I would think we'll find out sometime sooner than [00:53:00] that, but I don't know, man, apple between getting crushed on the AI stuff and just not being able to solve that and now having to compete with devices already from Google. I don't know. Like I, I've historically been pretty bullish on Apple's stock.
[00:53:15] I am, I'm starting to like think about that. I'm not offering investing advice here, but I am starting to wonder about Apple's long-term viability. Unless they can come, they gotta come out strong with something. They need to do what Google did and just like throw the gauntlet down on something. 'cause they haven't done that in a long time.
[00:53:31] AI’s Energy Usage
[00:53:31] Mike Kaput: Our next topic is about AI's impact on the environment. Now the energy footprint of AI is far bigger than most people realize, and it's growing fast according to a new investigation from MIT Technology review. this report reveals that training models like GPT-4 consumed enough electricity to power San Francisco for three days. [00:54:00] And that's just the beginning because it's not training the models that is eating up all the power, necessarily. Inference. The energy used each time someone interacts with AI is now the main driver of energy use according to this report. So every time you ask, say Chad, GPTA question, you generate an image, you create a short video, you use an AI tool to create some type of output.
[00:54:23] You're using energy equivalent to running a microwave or riding miles on an e-bike. Now, obviously, multiply that by billions of queries made each day, and the energy toll of AI as a category becomes enormous. According to the math that MIT tech review ran by 2028 alone, they predict ai, AI could use more electricity than 22% of all US households combined. Now, Paul, we've talked a bit here and there about AI's impact on the environment. It's a big concern. I, what's your take here? It doesn't seem like AI labs are really doing much to curb energy [00:55:00] usage. It just seems like, you know, with OpenAI Stargate, for instance, they're just looking to build more power generation.
[00:55:06] Paul Roetzer: Yeah, this is the, you know, the multi-trillion dollar. Pursuit. Like you have to build the data centers to not only train the models, but more and more to do the inference. Because we're talking about, you know, the devices we have today and the applications we have today, they're looking out five to 10 years and saying, we're gonna have a billion humanoid robots.
[00:55:26] They're all gonna be calling, there's gonna be ai. And every device we use, every piece of software's gonna have ai. Like it's, it's literally just gonna be everywhere. And every time it's used, it's gonna, you know, draw on the grid basically. So that's why so much effort's being put into, I. You know, other energy sources and the need to, you know, build out more.
[00:55:47] And I do get, like I, I've mentioned numerous times now, I get this question every time I do a talk now, like, there's always someone who's asking about the impact on the environment and energy and things like that. So, we'll, we'll keep [00:56:00] talking about it. This is one of the more advanced research reports I've seen that actually tries to quantify it.
[00:56:05] but I, what I tell people, and this is not a great answer, I think it's the truth. AI labs are aware, you probably have people who are environmentalists within the AI labs. Not all of them, but certainly there's gonna be people within those labs who care deeply about the environment as well. And, their general belief, the AI lab's general belief is let's solve intelligence and let intelligence solve it.
[00:56:30] Like we just need to build AGI and ASI we just gotta get there and then we'll figure out the energy thing after that. So they're gonna do what they can in the meantime and be energy efficient where they can and make algorithms more efficient. So they're, you know, less intensive in the power use, but the demand is gonna be so massive.
[00:56:47] It's just gonna keep growing. So that, I believe, truly is their hope, is that once we get to super intelligence, it'll figure out the energy stuff for us, because that's lonely. We, little humans can't like [00:57:00] figure this out on our own. We, we need super intelligence.
[00:57:03] Microsoft Build 2025
[00:57:03] Mike Kaput: All right. Next up. Microsoft just had its annual build conference where it unveiled over 50 new tools designed to shift AI from reactive assistance to autonomous agents that reason remember and act. So this agent First Vision cuts across everything from GitHub to Windows. GitHub. Co-pilot now functions like an AI teammate that can refactor code, implement features and troubleshoot bugs. Meanwhile, Azure's agent service supports complex multi-agent workflows for enterprise tasks. Now, at the heart of this push is memory. Microsoft introduced tech like structured retrieval and a gentech memory aiming to give all these agents across these different tools, context about your goals, your team, and your technology. Now, Paul, we've known Microsoft, like everyone else, is all in on AI agents, or at least whatever they believe or are [00:58:00] calling AI agents. Tons of enterprises use Microsoft products, and it sounds like those products are now going to have a ton more age agentic capabilities. Which kind of makes me think of the question like, what do businesses need to even be talking to employees about or teaching them when it comes to age agentic capabilities beyond just normal ai?
[00:58:21] Paul Roetzer: Teaching
[00:58:23] them how to use copilot in general would be a really good start. I can't tell you how many times a week I talk to companies who have copilot, who provided no change management training to their teams about what to do with it. so I don't know. I mean, agents do open up a whole new realm of, of challenges depending on how sophisticated and autonomous they actually are and what data they have access to and what systems they have access to internally.
[00:58:48] So there may be a whole bunch of training that's needed if they're just, you know, basically automations that, you know, are doing somebody's tasks for 'em, then it, you're just providing some basic training of how to set 'em up and how to create 'em. Like [00:59:00] you and I have done that with custom GPTs, you know, with some companies just guide 'em a little bit.
[00:59:05] yeah, I don't know, like poor Microsoft though. Oh my gosh. Like I, this is on Monday. I haven't heard a word about Microsoft since Monday. Like just Anthropic, you had the OpenAI stuff. You had Google io, like wow. Talking about like a short news cycle.
[00:59:22] Chatbot Arena Funding
[00:59:22] Mike Kaput: No kidding. All right. Next up. LM Arena is the newly formed startup behind the popular chatbot Arena platform, and it has raised a hundred million dollars in funding from heavyweights like Andreessen Horowitz, Lightspeed, and Kleiner Perkins. Now. you recall, we've talked about Chatbot Arena a bunch of times.
[00:59:46] it used to be called El Arena. This was a project that actually started in a uc, Berkeley lab rank AI models. And with this new development, it's now turned into a company that is valued at $600 [01:00:00] million Now. The site lets users pit AI models against each other and vote on which one performs best.
[01:00:07] The platform has logged over th 3 million votes across 400 models, which has made it this go-to benchmark for top labs like OpenAI, Google, and Anthropic. It's also got this community driven leaderboard, so it gives one of these few public spaces where open, a open source and proprietary models can be compared in real time using human preferences as the metric. But this research project costs millions of dollars per year to run, which is why they're raising funding and kind of forming a company around this. So they plan to expand features, cover compute costs, and make the user base more diverse with the money. Now Paul, I guess my big question here for you is like. How much can we trust? Chatbot Arena? We reported pretty recently about how there was some controversy about Big Labs trying to kind of [01:01:00] game this benchmark. It's hugely influential, now that it's a private company, will there be more pressure on them to influence or alter rankings based on, you know, who's paying them?
[01:01:13] Paul Roetzer: I honestly, when I'm looking at these numbers, a hundred million dollars seed round, so it's probably a 600 million post money, so they probably valued up $500 million and then they raised a hundred million. They, the only thing I can come up with, Mike, and this is off the top of my head because I hadn't thought about this before, this, is that their plan would be to do the industry and career specific.
[01:01:39] Rankings and benchmarks that they're going to get into, like the, ranking them for accountants, ranking them for lawyers. Like the only way I could see a total addressable market big enough to justify this kind of valuation is if there's a whole other business plan here to get into like the much larger space, which would be that.
[01:01:59] And then the probably some [01:02:00] other things I'm not thinking about, but it's an enormous valuation for an air proned chatbot ranking system, that's most people outside of tech don't even know exists.
[01:02:12] Mike Kaput: Yeah. We, one of the things in the past we've reported on that's pretty recent is their prompt to leaderboard feature, which is like, you basically put in any prompt and it'll generate a leaderboard that understands like which one, which models will do best on it. So that might be some, some version of what you're talking about, but yeah,
[01:02:30] Paul Roetzer: Like subscriptions. What's the revenue model? I don't know. Eh, I'll have to, it's, I think about this one later. My, my brain is un incapable at the moment of like processing this, but yeah, there's obviously something much more to the business plan than what is currently
[01:02:45] Mike Kaput: And also just as a bigger note too, and we've mentioned this a couple times, like people when we talk about like state-of-the-art models or a new model comes out and someone's like, well, you know, such and such model crushed a benchmark or a leaderboard. [01:03:00] This is the kind of thing they're talking about.
[01:03:01] Paul Roetzer: Yes.
[01:03:02] Mike Kaput: just like, there's certainly established tests in math and science and things, but when they say like top the chatbot leaderboard, it's often this one they're talking about.
[01:03:10] Paul Roetzer: Yeah.
[01:03:10] Mike Kaput: a community leaderboard.
[01:03:11] Paul Roetzer: Right, but imagine like new model drops. You got cloud four, you got 2.5. I'm a lawyer. I don't know which one helps me write my legal briefs best. And I can go in and be like, yeah, I need to write a legal brief and then be like, boom, cloud four ranks. You know, it's, it's done 2000 legal briefs and like that's, that's valuable to me.
[01:03:31] I don't know what the market looks like, but obviously those VC firms did some analysis and decided it was a multi hundred billion dollar market.
[01:03:39] Empire of AI from Karen Hao
[01:03:39] Mike Kaput: Hmm.
[01:03:41] All right, next up in 2019, journalist Karen Hao walked into OpenAI's offices with rare access. And one big question, what was this ambitious, secretive company really building? And what she found at the time was a research lab in transition. were rapidly shifting [01:04:00] from nonprofit idealism to a corporate entity racing towards artificial general intelligence. Her reporting is now chronicled in a new book called Empire of ai, and it reveals how OpenAI's mission to benefit all of humanity was already colliding with its actions behind closed doors. So at the time, OpenAI had just begun to withhold models like GPT two. They had cut a controversial deal at the time of Microsoft and restructure aim to restructure themselves to allow profit seeking investment.
[01:04:32] Now executives have insisted these moves were necessary to stay competitive and steer AGI safely. But house interviews at the time, nearly three dozen suggested some growing secrecy, internal tension and a widening gap between OpenAI's, public messaging, and private ambitions. after that first article was published in 2020 opening, AI actually cut off communication with her. as how now reveals that profile [01:05:00] became a touchstone and encouraged a bunch more insiders to come forward and talk to her. So the book which came out on May 20th is based on over 300 interviews since then, and paints a comprehensive and not at all flattering picture of OpenAI behind the scenes. So Paul, we have followed Karen's work for quite some time.
[01:05:21] She spoke at our marketing AI conference and she's done awesome work, but kind of sounds like she's sounding some alarm bells here in this book.
[01:05:30] Paul Roetzer: I did, I had this book on pre-order. I got the audio 'cause I was planning on listening to on my flights and then I was working on some other stuff and I didn't get to it, but I, I, I'm absolutely going to read this. she's a great writer and she's respected. She's been in some leading publications and yeah, I, I'm sure opening eye doesn't like it.
[01:05:50] I can't really comment on it until I've actually read the thing, but if you're intrigued by this, the kind of the drama, the soap operASIde of all of this, [01:06:00] I, I'm guessing this book is full of fascinating things that you would find intriguing. So I would recommend it, only because she's a great writer and we filed her for so long that I'm sure it's, it's an incredible work.
[01:06:12] So yeah, more, more to come once I actually, get a chance to get through it.
[01:06:18] AI in Education Updates
[01:06:18] Mike Kaput: All right. Next up this week, we have some more stories that add to our ongoing conversation around AI's impact on education. So two stories this week. First up, a Northeastern University student has demanded an $8,000 refund from the college. After discovering her professor used chat GPT to generate course
[01:06:38] materials.
[01:06:39] Now the issue here is that the syllabus that the teacher had banned AI use for students, so she is not alone. Across campuses, students are calling out what they see as some hypocrisy with professors leaning on AI to save time while punishing students for doing the same at least some of them, however, argue AI [01:07:00] makes them more efficient, frees time up for deeper engagement and can support student learning. Now the second thing we heard this week, Duolingo CEO, is taking a bit of a more controversial stance on AI and education and he actually came out saying, AI isn't really just a teaching tool, it is the future of instruction. over a hundred million users, he's now claiming that the company's AI can predict test scores and tailor learning better than any human teacher. This led the CEO. To make a controversial statement saying that schools were going to survive not for education, but because, quote, you still need childcare now, Paul, two interesting additions to the broader discussion we've been having on AI and education. Maybe gimme your thoughts on both of these.
[01:07:52] Paul Roetzer: the Northeastern one's kind, kind of funny. so the way she [01:08:00] found it was she was going through like a, it was organizational behavior, so she was going reviewing lecture notes and she noticed that partway through it was an instruction to chat GPT, quote, expand on all areas, be more detailed and specific.
[01:08:14] So the professor left the prompt in.
[01:08:18] Mike Kaput: Oh boy.
[01:08:18] Paul Roetzer: So I don't know, that's a little more lighthearted take on today. I know it's been a little heavy news. so yeah. and then the Duolingo one, man, I dunno. you still need childcare. God. I don't think PR the PR team wrote that talking point. I think there's just like, there's these, I think there's a really important need to drive far greater urgency around preparing for the change that is coming.
[01:08:50] I will say that, I think there's, there's just ways to go about it, but I don't know. I mean, maybe we just need to be [01:09:00] more direct and just say, say what it is. I think education is in a really, really difficult place, honestly. Like, I just, we've talked about it. I just, this week alone, I had two instances where I was really personally struggling with like, do I show my daughter how to use the tool to do this?
[01:09:18] 'cause I think it'll accelerate her learning. Or is that crossing a line, even though her school wouldn't, doesn't have any explicit, explicit permission not to do it. I felt like I was giving her an unfair competitive advantage to teacher to do it that way. And I worried that if I did, I was gonna get a call from somebody saying, okay, we have to outlaw this now because that, and then I get into ASItuation where like, but it's not like this is how she's going to, if you know these things, you have a competitive advantage in the workforce.
[01:09:45] And I feel like increasingly parents and teachers who like understand and teach this stuff, their kids and their students are gonna be so far ahead of other people. Like you can just [01:10:00] accelerate their understanding of topics so much faster. And like, it, I see it every time I work with my kids on this stuff, and I, I, I'm starting to like really worry that it's gonna bely distributed,
[01:10:13] In a, in a much greater way than I thought it was going to be. So, yeah, I don't know. I mean every week, but I, in the positive side, I keep getting great out, outreach from professors who are sharing stories with me of like cool things that they're doing. And, maybe as part of our, you know, we've got another idea for an upcoming series that we're working on for the podcast, we're gonna kind of tell these stories.
[01:10:35] I would love to start really highlighting some of the things that are happening in the education space in a really positive way. Because so much of the media news is not positive. It's like challenging. And then you, you know, throw in, we got things like, you know, issues with international students at major colleges and, there's a lot going on.
[01:10:54] It's hard, hard time to be in higher education. I think there's lots and lots of challenges and AI is just one of the [01:11:00] challenges they're facing.
[01:11:01] Listener Question
[01:11:01] Mike Kaput: All right, Paul, we're gonna end with our recurring segment on listener questions. I'm just gonna say I apologize in advance 'cause I wanted to select this one because it was extremely topical with Claude four, but I realize now that it doesn't end us on the most positive note. but we're gonna do it anyway. So the question is that someone asked was what measures are being taken to ensure the ability to shut down AI down if it goes rogue? And obviously a few years ago, this would've been a much more theoretical kind of out there question. But given kind of the stuff we talked about with Quad four, what measures, if any, are there for this kind of thing?
[01:11:43] Paul Roetzer: So,
[01:11:45] yeah,
[01:11:45] this man, we may have to come up with a second question today. Um,
[01:11:51] so
[01:11:52] if it's an open source model, nothing like if, if, if LAMA four comes out and [01:12:00] two weeks later they realize they screwed up and it can do things that it shouldn't be able to do, you're done. It's out like you can't pull it back.
[01:12:12] So if it's open source, and this is the argument of the proprietary model, closed sourced advocates is like, if something goes wrong, we can pull the model back. That's what OpenAI did a couple weeks ago when you know nothing from a security perspective or high risk, but it was just like being weird. They rolled the model back.
[01:12:31] they can monitor usage like anthropic monitors usage. It looks at the words being used, like it, they have deep monitoring of that stuff. So if it's a proprietary closed system, they can monitor it, they can pull back, they can make updates to the system instructions to try and resolve something. If it's a company that doesn't care or that wants to, cause chaos or misuse
[01:12:58] Mike Kaput: Hmm.
[01:12:59] Paul Roetzer: Mm,
[01:12:59] [01:13:00] nothing like it.
[01:13:01] This is the, this is the risk we take is that they take on their own goals. They, replicate and self-improve and do their own thing. This is the sci-fi thing like that, you know, the stuff you'd see in the movies. So the, I don't know, but like the one thing I saw this morning, and I didn't, I wasn't even gonna put it in the show notes.
[01:13:21] I didn't actually, I didn't even tell you Mike that. So the character that I I AI case from last year where the 14-year-old boy, committed suicide, in part because of the relationship he developed with an AI bot, the character that I had filed to dismiss the case, and the judge as of, I think yesterday, refused to dismiss the case.
[01:13:42] Meaning the judge believes there's a possibility that the AI company itself is liable for what happened. And that's a big deal, like. So I, this is another thing I was doing with, Gemini this morning. I was like, explain the legal precedent here. Like why is this matter? What's, [01:14:00] what is tort law? Like?
[01:14:00] I was kind of going through, trying to comprehend this, but in essence what that case is saying is if it goes and if it doesn't get settled, or even if it does, I guess it could still play a role, it could set a precedent that the AI companies building the models are liable for the outcomes of what happens.
[01:14:17] You know, individuals at a higher level of security risk, at bio weapons, like, so they're trying to do it to be good citizens right now, but there's a decent chance, and this is just US law, that you could be looking at where the model companies are liable and that could slow some stuff down pretty fast.
[01:14:37] If it ends up being that something goes wrong, it's on, on that company. so yeah, I don't know. I'm sure there's like other legal proceedings going on. There's probably other ways that they're looking at it, but. You know, my basic understanding is if it's open source, you're cooked. If it's closed, they can pull it back.
[01:14:53] That's kind of like the gist of it.
[01:14:56] Mike Kaput: Well,
[01:14:57] Closing Thoughts
[01:14:57] Paul Roetzer: What do you got? Anything you're excited about? Like
[01:14:59] Mike Kaput: I've [01:15:00] got. I've got something good.
[01:15:01] Paul Roetzer: go, go, go. Yes, do it.
[01:15:03] Mike Kaput: is that if you jump on any of our social media accounts for the AI show or jump onto Paul's LinkedIn, you
[01:15:10] there we go.
[01:15:11] Mike Kaput: Post of us In the latest AI trend, there's a trend where people are using AI to make podcasts and their hosts into babies talking through topics.
[01:15:22] Don't do it for this podcast. We did it for one that has more positive topics, but we actually had our, our team, Claire, and our team, made a clip of us as babies talking through ai, and it's hilarious.
[01:15:33] Paul Roetzer: It is, it is amazing. And actually, we talked to Claire, like we, we, I don't know if we have the podcast, we're gonna do this through our AI academy, like teach a class on it. But I said like, how did you do it? And she's like, yeah, it took like an hour, like kinda went through these few steps and honestly, it probably take a lot less time now.
[01:15:48] It's hilarious, man. Like I, and the thing I always laugh is like, whatever, like it decided, I like, 'cause I think she just gave it images of us and then it created the ba realistic babies and then put it in lip synced and [01:16:00] everything. I am so happy talking about AI agents, like my, my baby me can't stop smiling about talking about AI agents.
[01:16:08] So Yeah, it's hilarious. I put it on LinkedIn. it's on the socials, like you said, and then we'll put the link in the show notes. But it's just like a, I don't know, like a one minute clip or something. But it, it's definitely funny. It'll, it'll make you laugh.
[01:16:20] Mike Kaput: So that's a good note to end
[01:16:21] Paul Roetzer: There you go. Good.
[01:16:22] Mike Kaput: regardless, Paul, these are important topics. I know it's some of them are downers like, but we really appreciate you demystifying everything. I think like this conversation helps people at least feel a little more kind of in control of their own destiny. So as always, appreciate it.
[01:16:38] Paul Roetzer: Yeah, I will say Mike and I called an audible at the last minute and yanked one of the topics today. So there
[01:16:45] is one that, that I just could not do today. so we, we will put it on next episode, 51. So again, episode, one 50 is gonna be the new AI answers special episode. And then 1 51 will be our regular weekly [01:17:00] and we will talk about AI and grieving on, on that one.
[01:17:04] yeah, that was not, just not happening today for me mentally.
[01:17:07] Mike Kaput: I think that was a good call.
[01:17:08] Paul Roetzer: Yeah. All right. So, thanks everyone. and again, check out episode one 50 for AI answers, and we will talk with you all again soon.
[01:17:18] Thanks for listening to the Artificial Intelligence Show. Visit smarter x.ai to continue on your AI learning journey and join more than 100,000 professionals and business leaders who have subscribed to our weekly newsletters, downloaded AI blueprints, attended virtual and in-person events, taken online AI courses, and earn professional certificates from our AI Academy and engaged in the Marketing AI Institute Slack community.
[01:17:43] Until next time, stay curious and explore ai.
Claire Prudhomme
Claire Prudhomme is the Marketing Manager of Media and Content at the Marketing AI Institute. With a background in content marketing, video production and a deep interest in AI public policy, Claire brings a broad skill set to her role. Claire combines her skills, passion for storytelling, and dedication to lifelong learning to drive the Marketing AI Institute's mission forward.


