Key Takeaways
- AI models “learn” from vast datasets, including copyrighted material.
- AI outputs are technically original but can reflect patterns, styles, or motifs from the input material.
- Copyright concerns can arise from indirect influence, even when the AI doesn’t directly copy someone else’s creation.
- Understanding the difference between inspiration and copying is crucial for legal and ethical use of AI outputs.
Do you worry that using AI to generate content might land you in hot water, perhaps even with a lawsuit from an artist or a company claiming you’ve used their creation? If so, your concerns are well founded, as cases like this have already happened, and not just once.
One such case involves music publishers, including Universal, Concord, and ABKCO, which sued Anthropic, claiming that the company’s AI model was trained on copyrighted songs without permission and even reproduced lyrics from those songs in its outputs.
Given cases like this, it’s hardly surprising that copyright infringement involving AI has become one of the most hotly debated issues today. What’s often overlooked, though, is that many of the AI-related legal disputes actually stem from indirect copyright problems.
In practice, this occurs mostly because AI systems are trained on vast amounts of data scraped from the internet, including copyrighted content, and usually without the author’s consent. In the process, metadata, attribution, and other details that identify the copyright owner are stripped away. That material or fragments of it can then resurface in AI outputs. When this happens, the result may amount to a derivative work of the original, copyrighted content.
Next, we’ll explain how AI can produce outputs that draw on copyrighted works, even without directly copying them. For that, let’s clarify what we actually mean by input and output in the context of AI.
How AI Learns and Creates: Input vs. Output
To better understand how copyright law applies to AI-generated content, let’s examine the current context, which is largely shaped by the AI’s input, output, and the nuanced grey areas involved.
- Input (the data AI is trained on): AI models are trained on large datasets that often include copyrighted works, such as books, songs, or visual art. Through this process, the AI absorbs patterns, styles, and creative choices from the existing material. Even if the AI doesn’t store the original content with the intention of reproducing it at some point, it uses that knowledge as the basis for its future creations.
Example: An AI trained on Beatles’ songs, Van Gogh’s paintings, or passages from Hemingway’s books analyses recurring features, like chord progressions, colour contrasts, or sentence structures. It doesn’t memorise entire works but learns the underlying patterns, which it can later draw on to generate new content with a comparable style. - Output (the content AI generates based on the patterns it learned from inputs): When an AI generates a new story, song, or image, it doesn’t replicate the original work note by note, word by word, or pixel by pixel. Instead, it combines and reinterprets the elements it has learned, like musical motifs, sentence structures, colour schemes, or composition patterns, to create something new. While technically original, the output may still reflect the patterns it studied in the training data.
Example: An AI generates an image of a moonlit landscape with swirling skies and vibrant yellows reminiscent of Van Gogh’s Starry Night. Or it could produce a short story that mirrors Hemingway’s distinctive sentence style, or a song that echoes The Beatles’ harmonic patterns. In each of these cases, the outputs are not direct copies of the original creations, but they may reflect the stylistic choices of copyrighted material.
Nuanced Copyright Grey Areas in AI-Generated Content
This is where the complexities of copyright come into play. An AI’s output may incorporate the styles, motifs, or creative essences of multiple copyrighted works. Even though the resemblance might be subtle, given that the AI’s generation is not a direct reproduction of any copyrighted material, it can still be identifiable by someone familiar with one of the original artists’ works. And this raises the same important questions about authorship and authenticity.
Example: An AI-generated painting might blend the swirling skies of Van Gogh’s Starry Night with the vivid colour contrasts of Monet’s Water Lilies. Or, the AI could produce a short story that mixes Hemingway’s concise sentence style with Tolkien’s rich descriptive imagery, or a song that subtly echoes The Beatles’ chord progressions while blending in modern harmonies. To most people, these outputs might seem original, but someone familiar with the style of any of these artists may recognise the distinctive patterns, stylistic choices, or creative signatures. This indirect borrowing ties the AI’s output back to copyrighted material, even without direct copying.
Why Copyright Challenges Arise with AI
The copyright issues surrounding AI aren’t random. They arise from the very way these systems are designed and trained. To generate anything meaningful, AI needs to “learn” how to identify patterns, styles, and structures. The only way an AI model can do this is to analyse large amounts of existing content, which presents different patterns, styles, and structures. Because this learning is based on existing human creations, even if AI doesn’t copy content directly, its outputs inevitably reflect the work of the original creators.
The legal and ethical friction comes from this indirect “borrowing”. Even though the AI is not reproducing copyrighted work, it’s using the essence of someone else’s creative effort to produce something new. Therefore, copyright issues with AI happen because AI needs to learn from human creativity, but in doing so, it sometimes straddles a fine line between inspiration and infringement. And this is what creates the potential for legal and ethical conflicts.
A Simple Analogy
Copyright and its limits can feel abstract, so let’s use a couple of examples to make things clearer.
Think about a classic fairytale like Cinderella. The general story—a poor girl treated unfairly, a magical transformation, and a lost slipper—is a public domain idea that can’t be copyrighted. That’s why Disney can make a Cinderella movie, and another studio can create a different version, like A Cinderella Story with Hilary Duff. What’s copyrighted is Disney’s specific film: their unique dialogue, musical score, and animated character designs. Anyone can create a new story inspired by the Cinderella plot, but they can’t use Disney’s exact character designs or their specific character names.
The same goes for art. Van Gogh doesn’t own the colours yellow and blue, nor the idea of swirling skies. What copyright protects is his specific painting: the brushstrokes, the composition, the execution. Anyone can use those colours or paint a night sky, but if you reproduce Starry Night or sections of it stroke for stroke, that would be infringement. This distinction between inspiration and copying a protected expression is at the heart of AI’s copyright challenges.
So what does this mean for you? If you’re using AI tools to create something, it’s worth remembering that while the technology can spark creativity, it also comes with risks. Outputs that feel fresh may still echo copyrighted works, and that could have legal consequences if you use them commercially. Thus, the safest approach is to treat AI as a starting point for new ideas rather than using AI outputs as finished products you can publish without review. By doing so, you can enjoy the benefits of AI while steering clear of any legal grey areas.
Extra Sources and Further Reading
- Generative AI Has an Intellectual Property Problem – Harvard Business Review
https://hbr.org/2023/04/generative-ai-has-an-intellectual-property-problem
This article argues that generative AI faces serious intellectual property challenges, since these systems often produce outputs derived from copyrighted works in ways that blur the line between innovation and infringement. - Copyright in Works Created with Generative AI – Congress.gov
https://www.congress.gov/crs-product/LSB10922
This document examines the legal complexities surrounding generative AI and copyright law, focusing on authorship requirements and the implications for AI-generated works. - Copyright Issues in Artificial Intelligence: A Comprehensive Examination from the Perspectives of Subject and Object – ResearchGate
https://www.researchgate.net/publication/375767869_
Copyright_Issues_in_Artificial_Intelligence_A_Comprehensive_
Examination_from_the_Perspectives_of_Subject_and_Object
This article explores the legal complexities of AI-generated content, focusing on authorship, ownership, and the implications for copyright law.

