Open Source AI Between Enablement, Transparency and Reproducibility
As part of the Deep Dive:AI, the Open Source Initiative (OSI) is gathering a diverse collection of leaders to collaborate in drafting a definition for Open Source AI. Speakers from law, academia, NGOs, enterprise, and the OSS community will present webinars addressing pressing issues and potential solutions in our development and use of AI systems. ZOOOM team members, departing from their project experience, have successfully answered to the OSI call for proposals to help describe precise problem areas in AI and contain clear suggestions for solutions.
On 5th October 2023, Thursday, you are welcomed to join the Deep Dive: AI Webinar Series (2023) for a talk by our colleagues Ivo Emanuilov, IP lawyer / PhD researcher, KU Leuven Centre for IT & IP Law, and Jutta Suksi, Senior Specialist, Legal and Design in Data Economy, VTT Technical Research Centre of Finland Ltd., on Open source AI between enablement, transparency and reproducibility!
- 5 October 2023, 13.00, US East (UTC-4) time zone
- Join us: https://osi.gl.rna1.blindsidenetworks.com/nic-82j-yj7-42f
We have the honour our talk to be selected among the few on this new series. What are Ivo and Jutta going to talk about?
Open source AI is a misnomer. AI, notably in the form of machine learning (ML), is not programmed to perform a task but to learn a task on the basis of available data. The learned model is simply a new algorithm trained to perform a specific task, but it is not a computer program proper and does not fit squarely into the protectable subject matter scope of most open source software licences. Making available the training script or the model’s ‘source code’ (eg, neural weights), therefore, does not guarantee compliance with the OSI definition of open source as it stands because AI is a collection of data artefacts spread across the ML pipeline.
The ML pipeline is formed by processes and artifacts that focus on and reflect the extraction of patterns, trends and correlations from billions of data points. Unlike conventional software, where the emphasis is on the unfettered downstream availability of source code, in ML it is transparency about the mechanics of this pipeline that takes centre stage. Transparency is instrumental for promoting use maximisation and mitigating the risk of closure as fundamental tenets of the OSS definition. Instead of focusing on single computational artefacts (eg, the training and testing data sets, or the machine learning model), a definition of open source AI should zoom in on the ‘recipe’, ie the process of making a reproducible model. Open source AI should be less interested in the specific implementations protected by the underlying copyright in source code and much more engaged with promoting public disclosure of details about the process of ‘AI-making’. The definition of open source software has been difficult to apply to other subject matter, so it is not surprising that AI, as a fundamentally different form of software, may similarly require another definition. In our view, any definition of open source AI should therefore focus not solely on releasing neural network weights, training script source code, or training data, important as they may be, but on the functioning of the whole pipeline such that the process becomes reproducible. To this end, we propose a definition of open source AI which is inspired by the written description and enablement requirement in patent law. Under that definition, to qualify as open source AI, the public release should disclose details about the process of making AI that are sufficiently clear and complete for it to be carried out by a person skilled in machine learning. This definition is obviously subject to further development and refinement in light of the features of the process that may have to be released (eg, model architecture, optimisation procedure, training data etc.). Some of these artefacts may be covered by exclusive IP rights (notably, copyright), others may not. This creates a fundamental challenge with licensing AI in a single package.
One way to deal with this conundrum is to apply the unitary approach known from the European case law on video games (eg, the ECJ Nintendo case) whereby if we can identify one expressive element that attracts copyright protection (originality), this element would allow us to extend protection to the work as a whole. Alternatively, we can adopt the more pragmatic and technically correct approach to AI as a process embedding a heterogenous collection of artefacts. In this case, any release on open source terms that ensures enablement, reproducibility and downstream availability would have to take the form of a hybrid licence which grants cumulatively enabling rights over code, data, and documentation. In this session, we discuss these different approaches and how the way we define open source AI and the objectives pursued with this definition may predetermine which licensing approach should apply.
Check out more from the Open Source Initiative at their YouTube Channel!