Cyberspace agonies: The internet is about to get much worse

Artists are deleting their work from X after the company said it would be using data from its platform to train its AI. Hollywood writers are on strike partly because they want to ensure their work is not fed into AI systems that companies could try to replace them with;

Update:2023-09-28 11:00 IST

Cyberspace agonies: The internet is about to get much worse

Representative image

• JULIA ANGWIN

NEW YORK: Greg Marston, a British voice actor, recently came across “Connor” online — an AI-generated clone of his voice, trained on a recording Marston had made in 2003. It was his voice uttering things he had never said.

Back then, he had recorded a session for IBM and later signed a release form allowing the recording to be used in many ways. Of course, at that time, Marston couldn’t envision that IBM would use anything more than the exact utterances he had recorded. Thanks to artificial intelligence, however, IBM was able to sell Marston’s decades-old sample to websites that are using it to build a synthetic voice that could say anything. Marston recently discovered his voice emanating from the Wimbledon website during the tennis tournament. (IBM said it is aware of Marston’s concern and is discussing it with him directly.)

His plight illustrates why many of our economy’s best-known creators are up in arms. We are in a time of eroding trust, as people realize that their contributions to a public space may be taken, monetized and potentially used to compete with them. When that erosion is complete, I worry that our digital public spaces might become even more polluted with untrustworthy content.

Already, artists are deleting their work from X, formerly known as Twitter, after the company said it would be using data from its platform to train its AI. Hollywood writers and actors are on strike partly because they want to ensure their work is not fed into AI systems that companies could try to replace them with. News outlets including The New York Times and CNN have added files to their website to help prevent AI chatbots from scraping their content.

Authors are suing AI outfits, alleging that their books are included in the sites’ training data. OpenAI has argued, in a separate proceeding, that the use of copyrighted data for training AI systems is legal under the “fair use” provision of copyright law.

While creators of quality content are contesting how their work is being used, dubious AI-generated content is stampeding into the public sphere. NewsGuard has identified 475 AI-generated news and information websites in 14 languages. AI-generated music is flooding streaming websites and generating AI royalties for scammers. AI-generated books — including a mushroom foraging guide that could lead to mistakes in identifying highly poisonous fungi — are so prevalent on Amazon that the company is asking authors who self-publish on its Kindle platform to also declare if they are using AI.

This is a classic case of tragedy of the commons, where a common resource is harmed by the profit interests of individuals. The traditional example of this is a public field that cattle can graze upon. Without any limits, individual cattle owners have an incentive to overgraze the land, destroying its value to everybody.

We have commons on the internet, too. Despite all of its toxic corners, it is still full of vibrant portions that serve the public good — places like Wikipedia and Reddit forums, where volunteers often share knowledge in good faith and work hard to keep bad actors at bay. But these commons are now being overgrazed by rapacious tech companies that seek to feed all of the human wisdom, expertise, humour, anecdotes and advice they find in these places into their for-profit AI systems.

Consider, for instance, that the volunteers who build and maintain Wikipedia trusted that their work would be used according to the terms of their site, which requires attribution. Now some Wikipedians are apparently debating whether they have any legal recourse against chatbots that use their content without citing the source.

Regulators are trying to figure it out, too. The European Union is considering the first set of global restrictions on AI, which would require some transparency from generative AI systems, including providing summaries of copyrighted data that was used to train its systems. That would be a good step forward, since many AI systems do not fully disclose the data they were trained on. It has primarily been journalists who have dug up the murky data that lies beneath the glossy surface of the chatbots. A recent investigation detailed in The Atlantic revealed that more than 170,000 pirated books are included in the training data for Meta’s AI chatbot, Llama. A Washington Post investigation revealed that OpenAI’s ChatGPT relies on data scraped without consent from hundreds of thousands of websites.

But transparency is hardly enough to rebalance the power between those whose data is being exploited and the companies poised to cash in on the exploitation.

Tim Friedlander, founder and president of the National Association of Voice Actors, has called for AI companies to adopt ethical standards. He says that actors need three Cs: consent, control and compensation. In fact, all of us need the three Cs. Whether we are professional actors or we just post pictures on social media, everyone should have the right to meaningful consent on whether we want our online lives fed into the giant AI machines.

And consent should not mean having to locate a bunch of hard-to-find opt-out buttons to click — which is where the industry is heading. Compensation is harder to figure out, especially since most of the AI bots are primarily free services at the moment. But make no mistake, the AI industry is planning to and will make money from these systems, and when it does, there will be a reckoning with those whose works fuelled the profits.

For people like Marston, their livelihoods are at stake. He estimates that his AI clone has already lost him jobs and will cut into his future earnings significantly. He is working with a lawyer to seek compensation. “I never agreed or consented to having my voice cloned, to see/hear it released to the public, thus competing against myself,” he told me.

But even those of us who don’t have a job directly threatened by AI think of writing that novel or composing a song or recording a TikTok or making a joke on social media. If we don’t have any protections from the AI data overgrazers, I worry that it will feel pointless to even try to create in public. And that would be a real tragedy.

Cyberspace agonies: The internet is about to get much worse

Artists are deleting their work from X after the company said it would be using data from its platform to train its AI. Hollywood writers are on strike partly because they want to ensure their work is not fed into AI systems that companies could try to replace them with;

Similar News

Editorial: SAU South Asian no more

Eternal adjournments, impractical riders mar precious Constitutional values

I wish my father, Ronald Reagan, could remind Trump what America stands for

Why increasing rates of tuberculosis in UK, US should concern everyone

Women power on celluloid: Hindi movie mothers evolve into fuller beings

Editorial: Resolving the festering fishing dispute

Fashion has given up on being ‘woke.’ And that’s OK

An angry little boy on a great white horse

Editorial: Pati, patni and power

Emerging economies must get rich before they get old

A publisher pulled a romance novel after criticism from early readers

How our bodies react when we use social media – and when we stop