July 15, 2024
He quit a GenAI leader in protest. Now he wants to create fairer systems for artists

Ed Newton-Rex had reached a breaking point. As the vice president of audio at Stability AI, the 36-year-old was at the vanguard of a revolution in computational creativity. But there was growing unease about the movement’s strategy.

Stability was becoming an emerging powerhouse in generative AI. The London-based startup owns Stability Diffusion, one of the world’s most popular image generators. It also recently expanded into music generators with the September launch of Stable Audio — a tool developed by Newton-Rex himself. But these two systems were taking conflicting paths.

Stable Audio was trained on licensed music. The model was fed a dataset of over 800,000 files from the stock music library AudioSparx. Any copyrighted materials had been provided with permission.

Stable Diffusion had gone in a different direction. The system was trained on billions of images scraped from the web without the consent of creators. Many were copyrighted materials. All were taken without payment.

These images had taught the model well. Diffusion’s outputs pushed Stability to a valuation of $1bn in a $101mn funding round last year. But the system was attracting opposition from artists  — including Newton-Rex.

GenAI’s ethical dilemma

A pianist and composer as well as a GenAI pioneer, Newton-Rex was at odds with the unsanctioned scraping.

“I’ve always really wanted to make sure that these tools are built with the consent of the creators behind the training data,” he tells TNW on a video call from his home in Silicon Valley.

Stability was far from the only exponent of this method. The image generators MidJourney and Dall-E apply the same approach, as do OpenAI’s ChatGPT text generator and CoPilot programmer. Visual arts, written works, music, and even code are now constantly being reworked without consent.

In response, creators and copyright holders have launched numerous lawsuits. They’re angry that their work is being taken, adapted, and monetised without permission or remuneration. They’re also worried that their livelihoods are at stake.

“It’s in the AI industry’s interest to make people think that only the big players can do this.

Artists say that generative AI is stealing their work. The companies behind the systems disagree. In a recent submission to the US Copyright Office,  Stability argued that the training was “fair use” because the results are “transformative” and “socially beneficial.”

Consequently, the company asserted, there was no copyright infringement. The practice could therefore continue without permission or payments. It was a claim that had become common in GenAI, but one that Newton-Rex disputed.

“It really showed where the industry as a whole stands right now — and it’s not it’s not a place I’m happy with,” he says.

Newton-Rex considers the practice of exploitation. Last week, he resigned from Stability in protest.

The departure doesn’t mean that Newton-Rex has quit generative AI. On the contrary, he plans to continue working in the field, but following a fairer model. It’s not the impossible mission that the GenAI giants might depict. In fact, it’s already been accomplished by a range of companies.

Alternatives are available

Newton-Rex has a long history in computational creativity. After studying music at Cambridge University, he founded Jukedeck, a pioneering AI composer. The app used machine learning to compose original music on demand. In 2019, it was acquired by TikTok owner Bytedance. 

Newton-Rex then had spells as a product director at Tiktok and a chief product officer at Voicey, a music collaboration app that was acquired by Snap, before joining Stability AI last year. He was tasked with leading the startup’s audio efforts. 

“I wanted to build a product in music generation that showed what can be done with actual licensed data — where you agree with the rights holders,” he says.

That objective put him at odds with many industry leaders. GenAI was edging into the mainstream and companies were rushing to ship new systems as quickly as possible. Scraping content from the web was an attractive shortcut.

It was also demonstrably effective. At that time, there were still doubts that the licensed datasets were large enough for training state-of-the-art models. Questions were also raised about the quality of the data. But both those assumptions are now being disproved.

“What we call training data is really human creative output.

Stable Audio provided one source of counter-evidence. The system’s underlying model was trained on licensed music in partnership with the rights holders. The resulting outputs have earned applause. Last month, Time named Stable Audio one of the best inventions of 2023.

“For a couple of months, it was the state-of-the-art in music generation — and it was trained on music that we’d licence,” Newton-Rex says. “To me, that showed that it can be done.”

Indeed, there’s now a growing list of companies showing that it can be done. One is Adobe, which recently released a generative machine-learning model called Firefly. The system is trained on images from Creative Commons, Wikimedia, and Flickr Commons, as well as 300 million pictures and videos in Adobe Stock and the public domain.

As this data is provided with permission, it’s safe for commercial use. Adobe also stressed that creators whose work is used will qualify for payments.

A collage of images generated by Adobe Firefly