As the discussion on AI regulation is intensifying around the globe, the Australian Government’s Department for Industry, Science and Resources has recently announced Safe and responsible AI in Australia: Proposals paper for introducing mandatory guardrails for AI in high-risk settings. The Proposals Paper is strongly influenced by the EU AI Act, which is cited on numerous occasions, and aims to set binding horizontal standards for all high-risk AI technologies. The document proposes 10 Guardrails for AI in high-risk settings, with Guardrails 3 and 6 being of specific relevance in the copyright law context.
Guardrail 3: disclosing information about training datasets
Guardrail 3 in the Proposals Paper requires that organisations developing or deploying high-risk AI systems “[p]rotect AI systems, and implement data governance measures to manage data quality and provenance”. In this regard, the Guardrail suggests that “[d]ata must (…) be legally obtained. Datasets used to train AI systems or GPAI models must not contain illegal and harmful material such as child sexual abuse material or non-consensual intimate imagery. Data sources must be disclosed.” In addition, “[c]onsistent with ISO/IEC 42001 and the EU AI Act, this guardrail will cover: the origin and legality of the dataset and collection processes; documentation of data provenance” – i.e., how the data was collected, what is its source.
The proposed Guardrail should be generally welcomed as it promotes transparency and provenance around training data, which is essential for right holders wishing to license or enforce their rights in their content used as AI training data. Keeping in mind that Australia has neither fair use nor AI-specific exceptions, such as a TDM exception, these transparency obligations are especially important. At the same time, many questions will need to be clarified before this Guardrail is finalized. For instance, what does ‘legally obtained’ mean in a copyright context and under which law is this requirement to be judged? If a US company scraped data from online sources without authorisation from right holders, can they argue that they obtained the data legally as they relied on the fair use doctrine in the US (assuming that fair use applies to a given scenario, which is a question currently contested in the US courts)? Or will ‘legality’ be assessed based on Australian copyright laws?
Second, what does ‘legal obtaining’ of data mean under Australian law? Keeping in mind the lack of AI-specific exceptions, will AI developers or deployers have to demonstrate that they have authorisations/licences from Australian (or all) right holders? What happens if they don’t have such authorisations? Is requiring authorisation for any use of works (and other subject matter) in an AI training context fully justified, especially keeping in mind that currently there are no viable mechanisms to get licences to large amounts of content required for such processes? Or should Australia introduce certain AI-specific exceptions, such as the TDM exception that currently exists in the UK, EU and Japan – even if it were a more limited one?
Guardrail 6: disclosing the use of AI in generating synthetic content
Another provision relevant from a copyright perspective is the proposed Guardrail 6. This requires “[i]nform[ing] end-users regarding AI-enabled decisions, interactions with AI and AI-generated content.” According to part 3 of this Guardrail, “[o]rganisations must apply best efforts to ensure AI-generated outputs, including synthetic text, image, audio or video content, can be detected as artificially generated or manipulated.”
Again, while the Australian Government’s attempt to ensure more transparency around AI use, including in generating synthetic content, should be welcomed, it will need to be further elaborated before the Guardrail is finished. Firstly, the currently proposed Guardrail seems to suggest that AI use needs to be disclosed in all cases. It is however questionable whether disclosure is needed when AI is used in a limited or minimal context (e.g., for editing purposes, or when the AI generated material forms only a relatively small part of a complex work, like a movie) or when members of the public might have no interest in whether AI was used in generating specific content (e.g., use of AI in creating a marketing message or poster). Our public survey has shown that members of the public have lower expectations for disclosure when AI use is minimal. Also, does this disclosure duty apply for content that was generated by AI but then significantly modified by a human (and what would count as ‘significant’ modification)? Second, what should be the expected format of disclosure, and how will this duty differ depending on the type of content (image, video, text)? Further, how to ensure that watermarks and other identifiers are not ‘lost’ or intentionally removed as the content is being transferred and modified by subsequent users? Finally, who should be responsible for ensuring AI use is disclosed? Apart from AI developers and/or deployers, policy makers could also consider certain duties or prohibitions for content users (e.g., a prohibition on removing labels or other AI identifiers).
Overall, while the Australian Government is on the right track in developing binding legal standards for AI, further discussion is needed to finalize and polish the proposed standards before they are implemented in practice.
________________________
To make sure you do not miss out on regular updates from the Kluwer Copyright Blog, please subscribe here.