Introduction
Generative AI is disrupting the creative process(es) of intellectual works on an unparalleled scale. More and more AI systems offer services that push users’ production capacity for new literary and artistic works beyond unforeseen barriers. Algorithmic tools are gradually colonizing every creative sector, from being able to generate text (i.e., ChatGPT, Smodin), to perform music (i.e., AIVA, Beatoven, Soundful), to draw images (i.e., Dall-e, Midjourney, DreamStudio), and to shoot movies (i.e., Deepbrain AI, Veed.io). Apart from revolutionizing the creative markets, the ability to obtain new artworks with an increasing marginalization of human contribution has inevitably tested the fitness of copyright legislations all over the world to deal with the so-called “artificial intelligence” (‘AI’).
In a nutshell, generative AI raises two main copyright issues that branch off into further sub-problems which in turn intercept (if not collide with) some fundamental rights, especially freedom of artistic expression, freedom of art and science and the right to science and culture (Arts. 11 and 13 EUCFR, 19 UDHR, 27.1 UDHR, Art. 15.1 a and b ICESCR) and the right to the protection of moral and material interests of creators (Arts. 17.2 EUCFR, 27.2 UDHR, and 15.1 c ICESCR).
From the input side, it is questionable whether the training of AI through the extraction and mining of copyrighted works constitutes a copyright infringement or falls within an exceptions and limitations regime which varies from Europe to the United States and other parts of the world (in particular, Japan has an interesting copyright limitation which might apply). Indeed, human creators seek compensation for the novel use of their intellectual efforts while AI firms aim to maximize the free harvesting of data (including copyright-protected materials) for training their algorithms. From the output side, it is hotly debated whether content produced by a generative AI satisfies the protectability requirement under copyright law to trigger the exclusive protection.
Courts are already dealing with the first question, since some content creators and licensees have filed copyright infringement lawsuits against providers of generative AI services (namely OpenAI, Meta, Stability AI, and Midjourney). These litigations might have convinced the European legislator to deal with the issue in the proposal for a regulation laying down harmonized rules on Artificial Intelligence (‘Artificial Intelligence Act’ or ‘AIA’), recently introducing a provision to address the issue of transparency with regard to the works used in the machine learning process. Lately, IP offices and the judiciary have started to decide on the copyrightability of AI-generated outputs.
In 2019, the U.S. Copyright Office (USCO) denied copyright protection for a painting titled “A Recent Entrance to Paradise” allegedly realized by the AI system named “Creativity Machine” because the work lacked human authorship. The decision was confirmed by the Review Board of the Copyright Office in February 2022 as well as in the recent decision by the U.S. District Court of Columbia of 18 August 2023, No 22-1564, specifying that ‘human authorship is a bedrock requirement of copyright’. On 21 February 2023, the USCO reviewed the registration of the comic book “Zarya of the Dawn” (Registration No. VAu001480196) excluding copyright protection for the images produced by the AI system Midjourney on the grounds that the changes by the alleged author were ‘too minor and imperceptible to supply the necessary creativity for copyright protection’.
Moreover, the Italian Supreme Court, in decision no. 1107 of 16 January 2023, acknowledged copyright protection for a digital flower created with the aid of software because the human contribution of the author was still identifiable. As a software-implemented creation, it was not in the public domain and the company willing to exploit the work had to clear the right to reproduction.
In China, the Beijing Internet Court denied copyright protection for an AI-generated work because of the lack of human involvement in the creative process. However, in another judgment of 24 December 2019, the Nanshan District Court of Shenzhen awarded copyright to an AI-generated text since it complied with the formal requirement of written work.
In sum, despite the different constitutional frameworks and copyright legislations in force in various regions of the world, there seems to be a common trend to reject algorithmic authorship based on the historical anthropocentric approach to copyright law. It is very likely that many more cases will be brought to courts in the near future.
This two-part post is focused on the input side of the challenges raised by generative AI. Drawing on a previous paper (see Geiger) and in line with some recent proposals advanced in IP literature (see below), it suggests exploring the idea of introducing a statutory license for machine learning purposes as a compromise solution to ensure an attractive environment for artificial creativity without marginalizing the role played by human authors. This remuneration proposal is rooted on a fundamental rights analysis that balances the competing interests at stake. Part 1 of the post discusses legislative proposals in this field and Part 2 will explore the potential statutory license solution.
Overreaching legislative proposals under discussion at EU and national level
In its original version, the AIA did not address copyright aspects. It aimed at striking a balance between enhancing innovation while granting fundamental rights by adopting a risk-based approach that was (quite surprisingly) totally agnostic to intellectual property rights. However, the outlined tensions between providers of generative AI and copyright holders led the European Parliament to include some limited considerations with regard to copyright aspects of machine learning.
Firstly, the amendment 399 to Art. 28 b) offers a notion of generative AI before national legislators engage in their own defining attempts. Indeed, the endemic cross-border applications of this technology make a fragmented approach highly undesirable. The European Parliament proposed to define generative AI as the service provided through ‘foundation models used in AI systems specifically intended to generate, with varying levels of autonomy, content such as complex text, images, audio, or video’. The illustrative list of intellectual works is to be welcomed because it enhances the adaptability of the provision to the growing production capacities of generative AI, which in the future may cover any kind of creative segment.
Secondly, para. 4, lett. c) of the mentioned amendment imposes on providers of foundation models the obligation to train, design and develop the model in compliance with Union and national legislation on copyright prior to making their service available on the market. For this purpose, providers should publish a sufficiently detailed summary of the use of training data protected under copyright law.
The transparency provision seems to require providers of foundation models to disclose a comprehensive listing of the copyrighted content used for training their algorithm(s), accompanied by precise identification of rightsholders. It can be presumed that these transparency rules have been introduced to allow rightsholders to more effectively exercise the opt-out right from the text and data mining exception established by Art. 4.3 of the CDSM Directive. It could also be a first step to establish an obligation to obtain a license for the machine learning (ML) uses in question, should it be considered that these uses fall within the exclusive right (this seems to be the purpose of several lawsuits against AI system producers in the US, claiming that these uses do not qualify as fair use under US Copyright law). Some strong pressure from rightsholders can surely be expected in this regard on this side of the Atlantic as well. A good example of a maximalistic approach is reflected in the recent draft bill proposal introduced in France on September 12, 2023, which proposes to submit the ML process to the exclusive control of the rightsholders whose works are used and that the authorship of the works generated by AI should be attributed to the authors of the works used in the machine learning process. Further, it obliges the creator to label the generated output “AI generated work” and to list the names of all authors whose works have been used in the training process. Such an overreaching solution, no matter how well intended, would be detrimental to the development of AI systems and result in making the jurisdiction adopting it very unattractive for these innovative sectors.
Furthermore, it has been rightly stressed that the fulfillment of the transparency obligation with regard to the work used appears quite unfeasible because of the low, still inhomogeneous, threshold of originality, the fragmentation of copyright across various jurisdictions and its multiple ownerships, the absence of a mandatory registration process and the general inadequate state of ownership metadata (see Quintais). Also, the technical feasibility needs to be confirmed as algorithms can be trained from an immense variety of sources and it might not always be easy to determine precisely which sources have been used.
It becomes crucial to elucidate the specific content of this proposals in the AIA during trilogue negotiations. The rationale behind the latest amendment to the AIA is quite clear: ensuring collaboration between providers of generative AI services and copyright holders as regards this new form of exploitation of creative works. The great divergence of interests and the high transaction costs of a potential licensing solution makes it unlikely, however, that agreed solutions can be elaborated without future legislative intervention. It is also probably not desirable that this crucial question for the future of creativity in the digital environment is left solely to the self-regulation of the various market players.
Moreover, the amendment seems to provide an effective enforcement mechanism to the opt-out right set forth by Art. 4.3 of the CDSM Directive. In the absence of a report containing all the copyrighted works mined and extracted for machine learning purposes, it would be nearly impossible for rightsholders to discover that their work has been injected into the software, except for blatant cases where the initial work is recognizable in the AI output or when there are other clear indications, such as the image produced by Stability AI that showed two football players with a watermark very similar to that of Getty Images.
However, the provision may produce an unintended – or at least undesirable – consequence: a sharp cut in the datasets available for the algorithmic training resulting from a massive exercise of the opt-out right. This would in turn affect the quality of the AI-produced outputs according to the old adage in information systems “garbage in, garbage out”. The narrative on biases in AI is rich with examples of the nexus between flawed inputs and flawed outputs, such as the stereotyped representation of female nurse vs. male doctor.
It is a delicate balancing exercise because the introduction of excessive (administrative and/or financial) burdens on AI providers may limit the input datasets with consequences on the advancement of AI systems. Indeed, the value of generative AI services to ensure support for creative activities shall not be underestimated. Also, it should not be forgotten that generative AI can also be used for scientific purposes, which might call for differentiated approaches with regard to the purpose of the ML (see Love) in question as the fundamental right to research calls for a privileged treatment of research over copyright claims (see Geiger & Jütte). The main challenge is to lay down a legal framework in which AI-based tools remain instrumental to human creativity rather than a stronger substitute. In addition, there remain some strong doubts as to how AI companies will operationalize the reporting obligation under the belated copyright provision.
Part 2 of this post will explore potential solutions to these challenges associated with generative IP.
A version of this contribution was posted first on October 4 on The Digital Constitutionalist (https://digi-con.org/). It summarizes the main findings of the article: “The Forgotten Creator: Towards a Statutory Remuneration Right for Machine Learning of Generative AI”, forthcoming in the Computer Law and Security Review, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4594873
________________________
To make sure you do not miss out on regular updates from the Kluwer Copyright Blog, please subscribe here.
Nice one. Thanks for sharing