<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=2006193252832260&amp;ev=PageView&amp;noscript=1">

2 Min Read

OpenAI's Controversial Interview About Sora

Featured Image

Wondering how to get started with AI? Take our on-demand Piloting AI for Marketers Series.

Learn More

OpenAI's CTO Mira Murati just did a controversial interview with the Wall Street Journal about the company's new AI video model, Sora.

The interview started strong, with plenty of discussion about Sora's remarkable (but flawed) capabilities at generating video from text. 

But things took a problematic turn when the reporter pressed Murati on where the training data for Sora came from. Murati stumbled, claiming she wasn't sure if YouTube, Instagram or Facebook videos were used. She vaguely cited "publicly available data" before shutting down further questions.

The segment lit up the AI world. If it's legal for AI companies to use copyrighted material for training, why the secrecy around data sources? The optics aren't great.

 

So was this just a PR blunder or a concerning peek behind the curtain of AI development?

I got the scoop from Marketing AI Institute founder/CEO Paul Roetzer on Episode 88 of The Artificial Intelligence Show.

This PR blunder is really a legal issue in disguise

On the surface, OpenAI CTO Mira Murati's failure to answer basic questions about Sora's training data in a Wall Street Journal interview seems like a rookie PR mistake.

As a former PR pro, Roetzer is baffled at how a major tech company could let their C-suite face such a high-profile interview unprepared. Especially on a question as obvious as data sourcing.

"We used to run workshops teaching people how to prep for media interviews," he says. "This would have been one of the questions I would have prepared this person for."

But the faux pas reveals a much thornier issue lurking beneath the surface:

Why the secrecy or uncertainty around AI training data?

Here's the crux of the problem, says Roetzer. If it's truly fair use for AI companies to train their models on copyrighted data, as they claim in ongoing lawsuits, then why not just say what data they used?

"If you think you're allowed to take videos from YouTube to train these models, then why don't you just say ‘We used YouTube videos’?" he asks.

The fact that OpenAI's own CTO can't or won't answer that question suggests a lack of confidence in their legal stance. There's simply no way Murati doesn't know what data Sora trained on, says Roetzer.

The rest of the interview

That said, Murati did reveal some key details on Sora in the interview. She said the model takes a few minutes to generate each video and is still significantly more expensive to run than ChatGPT or DALL-E. (However, the goal is to get costs down to DALL-E levels by launch.)

The interview also contained demos that OpenAI generated for The Wall Street Journal. It's important to note that, while these demos were impressive, they had flaws.

For instance, in a video meant to recreate the "bull in a china shop" idiom, the bull walks through dishware without breaking anything. Other inconsistencies, like a yellow taxi cab morphing into a gray sedan, cropped up in the custom videos.

This aligns with a pattern we've seen before, says Roetzer—we only see the slickest demos. And real-world performance often remains far behind the highlight reels.

In the end, Roetzer feels the interview shed valuable light on Sora's development, even with the crucial question of training data left conspicuously unanswered.

But that unanswered question looms large. How AI models are trained, and on whose data, cuts to the heart of the ethical and legal dilemmas that will define this technology.

Related Posts

Why OpenAI Is Funding AI Startups with $1 Million Equity Investments

Mike Kaput | November 15, 2022

OpenAI just launched an ambitious program to fund game-changing AI companies.

[The AI Show Episode 88]: Meet Devin, the “First AI Software Engineer,” The Rise of Humanoid Robots, and OpenAI’s Sora Interview

Claire Prudhomme | March 19, 2024

Hosts Paul Roetzer and Mike Kaput highlight Cognition's AI software engineer 'Devin', the significance behind Figure’s Humanoid Robots, and OpenAI's CTO's questionable responses to questions about Sora's training data in a WSJ Interview.

[The Marketing AI Show Episode 62]: ChatGPT Enterprise, Big Google AI Updates, and OpenAI’s Combative Response to Copyright Lawsuits

Cathy McPhillips | September 5, 2023

On this week's episode of the Marketing AI Show, we break down ChatGPT for enterprise, Google’s big news, and OpenAI’s defensive response to lawsuits.