Development of AI applications, including those based on machine learning, requires feeding AI algorithms with large amounts of training data. In many cases, this data consists of copyright-protected content, such as text, video or music files, extracts from databases, and visual materials. Under Australian copyright law, the use of copyright material is unlikely to be covered under the existing copyright exceptions.
The five fair dealing provisions (s 40-43 Copyright Act 1968) covering use for research and study, news reporting, criticism and review, parody and satire, and in judicial proceedings are unlikely to apply for various reasons. For instance, fair dealing for research and study is unlikely to apply when large amounts of entire works are reproduced for machine learning purposes, since it generally only allows use of parts of works (see, e.g. the position of the Australian Law Reform Commission Report 122, par 11.65). As another example, developing AI to be used in judicial proceedings or for providing legal advice is unlikely to be covered by the fair dealing exception when such an AI system is developed by a third party rather than by judges, lawyers or other qualified persons. Certain specific exceptions might be of some use but would not eliminate the infringement risk entirely. For instance, temporary copying exceptions available in s 43A and 43B of the Copyright Act 1968 may cover some of the temporary reproductions accruing during the process of machine learning. However, they are unlikely to cover copies made when creating a data set since these copies are not temporary or incidental.
As a result, when Australian AI developers need to use copyright-protected content for machine learning purposes, they generally need to get licenses from copyright owners. In some cases this might be quite feasible. For instance, Facebook’s broad Terms of Service probably allow Facebook to use any content uploaded by its users for AI training purposes, as long as it is used to ‘make [the] service better’. In many other cases, when large amounts of content belonging to different right holders (known and unknown) is needed, licensing is not a viable option.
Even if licensing solutions were available, Australian AI industries would be disadvantaged compared with AI industries in other developed countries, where use of content in machine learning is covered by copyright exceptions. For instance, in the US such uses are likely to be covered by fair use, while in the EU the new text and data mining exception (arts 3-4 of the Digital Single Market Directive) was essentially designed to allow non-commercial research and commercial AI projects (see here). Text and data mining does not require authorisation in Japan either.
It is thus time for the Australian government to take action and consider possible solutions to this problem. While this issue has been discussed, inter alia, in a report by the Australian Law Reform Commission (ALRC), the Australian government has not yet proposed any legislation to address the issue.
There are generally two options that the Australian government could consider: the first is facilitating licensing of content for machine learning/TDM purposes by e.g. introducing compulsory or extended collective licensing; and the second is introducing new (or amending existing) copyright exceptions to cover machine learning. The paper cited below assesses these options and reaches a conclusion that a specific exception for text and data mining, similar to that in the EU, is best capable of reaching a balance of interests between AI industries and right holders whose content is being used in the training of AI systems.
This blog post is an extended summary of the following article (under review):
Matulionyte, Rita, Australian copyright law impedes the development of Artificial Intelligence: what can be done? (May 8, 2020). Available at SSRN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3595797