Generative AI Faces Legal Challenges Over Copyrighted Content

In a recent filing to the U.K. Parliament, OpenAI, the creator of ChatGPT, admitted that leading AI models rely heavily on access to copyrighted books and articles. The AI industry, worth billions of dollars, has long argued that its generative AI technology learns from these texts instead of copying them, thus falling under the fair use doctrine. However, two recent lawsuits filed by the Universal Music Group and The New York Times have challenged this claim.

Large language models, like those powering ChatGPT, have the capability to memorize portions of their training text and reproduce them verbatim. This raises concerns about copyright infringement, undermining the fair-use argument. The implications are significant – if AI companies need to compensate the authors whose work they rely on, it could have a detrimental impact on the technology and its development.

The major venture-capital firm Andreessen Horowitz, which has invested significantly in generative AI, acknowledges that this could “kill or significantly hamper” the entire technology. To address the issue, models may have to be rebuilt using open or properly licensed sources, which would incur substantial costs and potentially result in less fluent models. Despite the setbacks, a responsible rebuild could mend the relationship between generative AI and content creators who have seen their work used without permission.

This is not the first time generative AI has faced legal battles. Authors including John Grisham and Sarah Silverman have previously filed class-action lawsuits against AI companies, arguing that training models using their books constitutes illegal copying. The fair-use argument has been a pillar for Silicon Valley’s tech industry in the past, enabling innovation and new technologies.

While companies like OpenAI have claimed that the training of language models is a non-expressive use, recent lawsuits have challenged this notion. Examples of models generating long passages from copyrighted material without attribution have emerged, highlighting the need for clearer boundaries.

As the legal challenges mount, AI companies aim to address the issue of memorization to mitigate liability. OpenAI, for instance, considers it a rare bug and is actively working to eliminate it. However, researchers have shown that memorization occurs in every large language model, raising concerns about the industry’s reliance on copyrighted content.

This legal reckoning will shape the future of generative AI, forcing companies to reevaluate their practices and find a balance between innovation and respecting intellectual property rights.

The source of the article is from the blog papodemusica.com

Privacy policy
Contact