Who Owns the Content AI Creates?
AI products like GitHub Copilot and ChatGPT, which ingest human content to make new material, raise novel legal and ethical issues.
(Bloomberg Businessweek) -- In November a lawyer and computer programmer named Matthew Butterick sued the tech companies GitHub, Microsoft and OpenAI, saying a tool called GitHub Copilot that automatically generates computer code is essentially plagiarizing the work of human software developers in a way that violates their licenses. The wronged parties in the case, in Butterick’s eyes, are the developers who worked on open source coding projects without explicitly giving permission for their code to be used to help artificial intelligence learn to program on its own.
This is an early skirmish in the battle about how such AI tools scramble the ideas of ownership, copyright and authenticity online. These tools had a banner year in 2022, and one likely result is that conflicts such as this will begin playing out in earnest in 2023.
Silicon Valley’s current buzzword for Copilot and other tools is “generative AI.” This technology ingests large amounts of existing digital content to train itself to make similar stuff on its own. In addition to computer code, generative AI is writing essays and making videos and images. Technologists have been predicting for years that these tools were the future, and OpenAI’s releases last year of the latest versions of its image-making tool ( DALL-E 2) and its text-generation tool ( ChatGPT) made it seem as if the future was suddenly here. The content these tools produce isn’t always convincing—DALL-E’s images of people, for instance, often include distorted faces and extra fingers—but they’re far better than their predecessors.
Copilot allows programmers to work faster by suggesting snippets of code as they type. It’s based on a subset of the technology that OpenAI used to make DALL-E 2 and ChatGPT. ( Microsoft Corp. owns GitHub and is the primary investor in OpenAI.) Everything Copilot knows about programming comes from its analysis of code that was initially written by humans, and the lawsuit contends that it’s violated the licenses of open source software, whose code is publicly available for examination and use, by using it in this manner. Some developers have complained publicly that Copilot’s code suggestions are at times lifted directly from their own programs. GitHub has acknowledged that the product can, in rare cases, copy code directly. It says it’s begun installing filters to prevent this action.
The conflict puts a new twist on long-running questions about what constitutes fair use when people rely on creative works as the source material for their own art, such as in music sampling and a wide range of visual art. It’s similar to when Vanilla Ice sampled David Bowie and Queen’s —an act that did lead to a lawsuit and settlement—but if Vanilla Ice were a robot.
Zahr Said, a law professor at the University of Washington, says the new technology will test the existing legal frameworks. “There’ll be plenty of folks who say, in general, when you’re using copyrightable or copyrighted work to train AI, you’re probably within fair use, right?” she says. “But in each case, nothing’s a guarantee.”
Oege de Moor, vice president of GitHub Next, which incubated Copilot, says human developers have always examined other people’s code to inform their own work. “These models are no different,” he says. “They read a lot of source code and make new source code themselves, so we think this is a correct and worthy cause.” In response to a request for comment on the lawsuit, GitHub said it was “committed to innovating responsibly.” Microsoft and OpenAI declined to comment on the suit.
Margaret Mitchell, an AI ethicist, says AI companies have a responsibility to consider whether they’re building their tools in appropriate ways, not only legally defensible ones. “I’m sure legal cases can be made, and I’m sure Microsoft and other tech companies employ lawyers to work on the legal scholarship to say this kind of stuff is legal,” she says. “There’s still a question of ‘What is the spirit of this law?’ ”
Outside the courtroom, differing opinions are emerging about who, if anyone, should be seen as the creator of AI-generated products. Visual media supplier Getty Images has said its site won’t host any AI-generated content, and earlier this year the US Copyright Office rejected a request by an artist to copyright an image on behalf of the algorithm that created it, saying the image lacked human authorship.
It’s not yet clear if typing words into DALL-E counts as human effort, but artists are already incorporating artificial intelligence into their work. In 2019 one artist couple, Holly Herndon and Mat Dryhurst, released an album called that features AI-generated voices. They also created an AI “voice instrument” called Holly+ that allows users to upload an audio file and hear it sung in Herndon’s voice.
Concerned about the ethics of using someone’s voice without their consent, they trained the model with their own voices and persuaded hundreds of others to join them in “large training ceremonies in Berlin.”
Herndon and Dryhurst also made a tool called Have I Been Trained? for those who want to see whether their work has been used to train Stable Diffusion, another AI-powered image-generation tool. It also lets artists indicate whether they want to opt in to having their works in AI datasets. They then collect the answers and send them to Stability.ai, which runs Stable Diffusion.
Dryhurst said in December that Stability.ai had agreed to start honoring such requests in the next version of the technology and that about 75% of respondents ask to opt out. OpenAI says it’s also working closely with artists to develop “practical and scalable solutions” to their needs. But Dryhurst’s tool doesn’t work for OpenAI’s models, because the company doesn’t disclose what data it’s used to train them.
©2023 Bloomberg L.P.