2 Min Read

Big Problems Discovered with AI Training

By Mike Kaput on April 25, 2023

AI companies like OpenAI are coming under fire for how AI tools are trained…

Reddit, which is often scraped to train language models, just announced it would charge for API access, in order to stop AI companies from training models on Reddit data without compensation.

Twitter recently made a similar move. And Elon Musk publicly threatened to sue Microsoft for, he says, “illegally using Twitter data” to train models.

Don’t be surprised if other companies follow suit…

An investigative report by the Washington Post recently found that large language models from Google and Meta trained on data from major websites like Wikipedia, The New York Times, and Kickstarter.

The report raises concerns that models may be using data from certain sites improperly. In one example, the Post found models had trained on an ebook piracy site—and so likely did not have permission to use the data it trained on. Not to mention, the copyright symbol appeared more than 200 million times in the data set the Post studied.

What concerns does this raise for marketing and business professionals using these tools?

I spoke to Marketing AI Institute co-founder and CEO Paul Roetzer on Episode 44 of the Marketing AI Show to learn more.

This will change the value proposition of putting data out there for free. Expect to see companies with proprietary data either train their own AI models and products (like Quora has) or charge for access to the data via API. Some might do both, says Roetzer. It also changes the value equation of putting data out there for free. In the past, you gave free access to your data in exchange for valuable benefits like more users or traffic. That equation may now change, as free access means you could be training a model that replaces the need for your site or brand.
AI training will need to change. In Europe, it’s looking like AI companies are struggling to train models in ways that don’t violate European law. Everywhere, it also appears AI companies are training models on copyrighted material. AI companies may get hit with massive penalties or legal actions—or dodge regulations entirely. But one thing is clear, no matter what happens. “The way they build these models is going to have to evolve,” says Roetzer.
Business leaders need to be prepared. “You have to address the fact that you may be using technology that was built illegally,” says Roetzer. That doesn’t mean you’ll get in trouble for using the technology. (It’s highly doubtful, but please check with a lawyer.) But you are going to likely train custom versions of models moving forward, models that are trained largely on compliant data that you legally own. And, prepare to hear about legal cases hitting big AI companies, even some you might use, moving forward.

Don’t get left behind…

You can get ahead of AI-driven disruption—and fast—with our Piloting AI for Marketers course series, a series of 17 on-demand courses designed as a step-by-step learning journey for marketers and business leaders to increase productivity and performance with artificial intelligence.

The course series contains 7+ hours of learning, dozens of AI use cases and vendors, a collection of templates, course quizzes, a final exam, and a Professional Certificate upon completion.

After taking Piloting AI for Marketers, you’ll:

Understand how to advance your career and transform your business with AI.
Have 100+ use cases for AI in marketing—and learn how to identify and prioritize your own use cases.
Discover 70+ AI vendors across different marketing categories that you can begin piloting today.

Mike Kaput

As Chief Content Officer, Mike Kaput uses content marketing, marketing strategy, and marketing technology to grow and scale traffic, leads, and revenue for Marketing AI Institute. Mike is the co-author of Marketing Artificial Intelligence: AI, Marketing and the Future of Business (Matt Holt Books, 2022). See Mike's full bio.

Generative AI Is Getting Sued. Here's Why You Should Pay Attention

Mike Kaput | January 24, 2023

Major generative AI companies are now facing legal challenges that could have big implications for anyone using AI tools that generate text, images, or code.

How 11 World-Class Brands Are Actually Winning with AI

Mike Kaput | September 13, 2018

Top brands are using AI across industries to sell more, provide exceptional customers experiences at scale, and transform their operations. Here are 11 actual case studies of companies using AI to build a competitive advantage.

Are You Sending the Right Signals to Recruit AI Talent to Your Organization?

Mike Kaput | March 26, 2018

Companies are falling behind in the race for acquiring AI talent—often, not by choice or ignorance, but because they aren’t sending the right signals to the talent marketplace. Here’s a few ways you can recruit top AI talent.

Big Problems Discovered with AI Training

Don’t get left behind…

Mike Kaput

About

Resources

Education

Subscribe to our newsletter for exclusive AI content:

Big Problems Discovered with AI Training

Don’t get left behind…

Mike Kaput

Related Posts

Generative AI Is Getting Sued. Here's Why You Should Pay Attention

How 11 World-Class Brands Are Actually Winning with AI

Are You Sending the Right Signals to Recruit AI Talent to Your Organization?