AI is learning from stolen intellectual property. It needs to stop.

Our books are copyrighted material, not free fodder for wealthy companies to use as they see fit, without permission or compensation.



October 19, 2023 - 5:33 PM

Photo by Michael Dziedzic/UNSPLASH

The other day someone sent me the searchable database published by Atlantic magazine of more than 191,000 e-books that have been used to train the generative AI systems being developed by Meta, Bloomberg and others. It turns out that four of my seven books are in the data set, called Books3. Whoa.

Not only did I not give permission for my books to be used to generate AI products, but I also wasn’t even consulted about it. I had no idea this was happening. Neither did my publishers, Penguin Random House (for three of the books) and Macmillan (for the other one). Neither my publishers nor I were compensated for use of my intellectual property. Books3 just scraped the content away for free, with Meta et al. profiting merrily along the way. And Books3 is just one of many pirated collections being used for this purpose.

My experience is hardly unique. According to the database the Atlantic has made available, four of Michael Beschloss’s books have been crawled, and 10 or so of Michael Lewis’s books have been ingested into the AI ecosystem. “I would never have consented for Meta to train AI on any of my books, let alone five of them,” novelist Lauren Groff tweeted recently. “Hyperventilating.” There are thousands of other examples.

This is wholly unacceptable behavior. Our books are copyrighted material, not free fodder for wealthy companies to use as they see fit, without permission or compensation. Many, many hours of serious research, creative angst and plain old hard work go into writing and publishing a book, and few writers are compensated like professional athletes, Hollywood actors or Wall Street investment bankers. Stealing our intellectual property hurts.

Some of us are starting to fight back. More should. One class-action lawsuit has been filed by authors Richard Kadrey, Sarah Silverman and Christopher Golden in federal court in California against Meta — what we used to call Facebook — seeking both an injunction against continuing to use the writers’ copywritten material and financial damages. The authors argue that to create Meta’s large language models, LLMs for short, which form the basis of Meta’s AI offerings, the LLMs are “trained” by copying text and extracting expressive information from it. Once the material has been “copied and ingested,” the LLMs are able “to emit convincing simulations of natural written language,” according to the lawsuit. “Much of the material in Meta’s training data set, however, comes from copyrighted works — including works written by Plaintiffs — that were copied by Meta without consent, without credit, and without compensation.” They filed a similar lawsuit against OpenAI, maker of ChatGPT. Author Michael Chabon has also filed a lawsuit against Meta for the same reasons. These lawsuits are in the early stages of the judicial process.

In the meantime, the tech companies behind the big AI data crawlers are raking it in. In January, Microsoft invested $10 billion in OpenAI, bringing its value close to $29 billion. It was Microsoft’s third investment in OpenAI, and it now owns 49 percent of the company. Microsoft is now valued at $2.3 trillion, up 33 percent this year. OpenAI is reportedly now raising more money that would increase its valuation to around $90 billion, a tripling of its value in nine months.

Not to be left behind, Google introduced its AI product, Bard, in February, as did Meta, which dubbed its AI, LLaMa. Roughly since these announcements were made, stock in Alphabet (Google’s parent company) has increased 50 percent to $1.7 trillion, while Meta’s stock value has increased about 145 percent, to $784 billion. In other words, these companies’ stock-market valuations have soared this year, thanks in part to their AI announcements and products, which are largely dependent on hoovering up the hard work of others.

Scott Galloway, a New York University marketing professor, best-selling author and podcast aficionado, thinks writers of all stripes should be focusing their ire on the likes of Microsoft, Google and Meta, not Disney, Warner Bros. Discovery and Paramount Global, as they did in the recently concluded Hollywood writers strike. He said recently on the Pivot podcast that 70 percent of Nasdaq’s gains in the first half of 2023 came from seven technology companies, most of which had AI product offerings. “So the question is, if AI is literally sucking the oxygen outta the room and all the market cap, it’s like, well what is driving that value?”

The answer, obviously, is the hundreds of thousands of content creators who are taking the time — often over many months and years — to report and to write and to think up the content that Meta, Google and Microsoft are scraping up into the LLMs without asking permission or paying proper compensation. Writers need to band together into a powerful lobbying force — perhaps led by billionaire Barry Diller, who has publicly taken up a cudgel — against ChaptGPT, Google, Microsoft and Meta, and their ilk, and fight for proper compensation for their work and a share of the hundreds of billions in value that has been created by the very mention of AI. The AI companies should pay authors a fair price to option their books for the right to consume their contents, just as Hollywood does when embarking on a film, documentary or television series. (Apple reportedly paid Michael Lewis $5 million for the movie rights to his new book about Sam Bankman-Fried.) And then also agree to pay authors royalties, if there are any to be had.

Right now, a few authors joining together to sue the likes of Mark Zuckerberg and Meta is a bit of a David-vs.-Goliath situation. Book publishers need to join this fight. Magazine publishers need to join this fight. Newspaper publishers and their billionaire owners, such as Jeff Bezos (who owns The Post), John Henry and Patrick Soon-Shiong, must join this fight.

To get companies with a combined market value in the trillions of dollars to stop stealing intellectual capital from writers might even require congressional action. The sooner the better.

About the author: 

William D. Cohan is a best-selling author and a founding partner of Puck News.