What do you know about (open) data?
ZOOOM seeks to create better understanding of the opportunities of the use of existing open data and the potential of data generated internally, how open data strategies affect the SMEs’ business models, including negotiating favourable license terms for access to high-quality data sets, the obligations to make available corresponding modifications when redistributing copyleft-licensed derivative works of products that incorporate both software and data, and more!
Learn with ZOOOM and explore open data possibilities on your own:
What is data?
When we talk about data, we mean a wide range of material: raw data, quantitative data, qualitative data, aggregated data, data records, datasets, databases, data products and data services – to name some. When we talk about intellectual property, we also mean a wide range of creations with intellectual input: literature, paintings, images, designs, software, and inventions – to name some. And when we talk about intellectual property rights, we talk about wide range of forms of legal protection granted for the intellectual property: copyrights, patents, trademarks, and database rights – to name some. The distinction between these three elements is crucial in forming understanding how the licensing of open data operates in real-life-cases.
The origin of the term of data is in Latin and its singular form meant a fact given or granted, (thing) given, a fact given as the basis for calculation in mathematical problems, or numerical facts collected for future reference.[1]
According to Merriam-Webster Dictionary data means: “1. factual information (such as measurements or statistics) used as a basis for reasoning, discussion, or calculation; 2. information in digital form that can be transmitted or processed; 3. information output by a sensing device or organ that includes both useful and irrelevant or redundant information and must be processed to be meaningful.”[2] From this definition it becomes clear that data is something that is meant to be further processed in order to refine and develop it further.
Cambridge Dictionary defines a dataset as “a collection of separate sets of information that is treated as a single unit by a computer”[3] and database as “a large amount of information stored in a computer system in such a way that it can be easily looked at or changed”[4]. These definitions highlight the digital nature of data and its processing.
One of the more recent and specific definition of data is identified in connection with the Montreal Data License, which defines data as the information being made available. The format and layout of such information is referred to as a database or dataset, whatever way it may be organised. Where applicable, the data may be separated into different segments into underlying data or metadata in the form of data tags and other structural information. As Misha Benjamin et al., 2019 point out, such data may be collected and harnessed from different sources or made available from a single source. Data can be basic collated information (e.g., a range of measurements such as temperature, location) or be formed of more complex information (e.g., pictures, maps).
Finally, there are concepts like data product and data-as-a-service which emphasise the end-user aspects relating to data, and the need to build products or provide services that are based on data.
[1] ‘Datum’, https://www.etymonline.com/word/datum
[2] ‘Definition of Data’, https://www.merriam-webster.com/dictionary/data
[3] ‘Dataset’, https://dictionary.cambridge.org/dictionary/english/dataset
[4] ‘Database’, https://dictionary.cambridge.org/dictionary/english/database
What is open data?
The roots of opening data for reuse lies in public sector data, open government data, and open science. In Europe, the regulatory basis for the reuse of public sector information stems from 2003. The Open Data Directive on open data and the re-use of public sector information is based on the (repealed) Public Sector Information (PSI) Directive that addressed the increasing demands for re-use of public sector data.
When looking into the licensing of open data, the key aspect is to understand the differences between licensing of open source software and open hardware compared to licensing of open data. All of these have a clear difference in the basis of their IP rights. Data as such is not protected by copyright, additional categories of IP rights apply for instance to databases. In addition, technological advancements affect the ways data is used, and there is a clear trend towards building products and services upon data. All these aspects affect the way open data is or should be licensed.
A good starting point for the definition of open data, is the Open Knowledge Foundation (OKFN) Open Definition. The OKFN Open Definition states that open means anyone can freely access, use, modify, and share for any purpose, subject, at most, to requirements that preserve provenance and openness).
The goals of the OKNF Open Definition include promoting a robust common in which anyone may participate, compatibility and interoperability are maximised, and quality of data is ensured. The OKNF Open Definition defines openness with regard to two categories: data and content. When open data is useful on its, its true value is revealed when it becomes open knowledge. Even though the OKNF Open Definition covers both categories, it is important to distinguish the two, as their treatment under intellectual property law is different.
The OKFN Open Definition is aligned with the Open Source Initiative (OSI)’s Open Source Software definition regarding software and also the Open Source Hardware Association (OSHWA)’s definition regarding hardware. The OKFN Open Definition states that for a work to be ‘open’ it must meet the following requirements relating to its distribution:
Requirement | Meaning |
Open license or status | The work must be in the public domain or provided under an open license (as defined in Section 2). Any additional terms accompanying the work (such as a terms of use, or patents held by the licensor) must not contradict the work’s public domain status or terms of the license. |
Access | The work must be provided as a whole and at no more than a reasonable one-time reproduction cost, and should be downloadable via the Internet without charge. Any additional information necessary for license compliance (such as names of contributors required for compliance with attribution requirements) must also accompany the work. |
Machine readability | The work must be provided in a form readily processable by a computer and where the individual elements of the work can be easily accessed and modified. |
Open format | The work must be provided in an open format. An open format is one which places no restrictions, monetary or otherwise, upon its use and can be fully processed with at least one free/libre/open-source software tool. |
In order to qualify as open license in accordance with OKFN Open Definition, the license needs to satisfy the following required permissions:
Required permissions | Content |
Use | The license must allow free use of the licensed work. |
Redistribution | The license must allow redistribution of the licensed work, including sale, whether on its own or as part of a collection made from works from different sources. |
Modification | The license must allow the creation of derivatives of the licensed work and allow the distribution of such derivatives under the same terms of the original licensed work. |
Separation | The license must allow any part of the work to be freely used, distributed, or modified separately from any other part of the work or from any collection of works in which it was originally distributed. All parties who receive any distribution of any part of a work within the terms of the original license should have the same rights as those that are granted in conjunction with the original work. |
Compilation | The license must allow the licensed work to be distributed along with other distinct works without placing restrictions on these other works. |
Non-discrimination | The license must not discriminate against any person or group. |
Propagation | The rights attached to the work must apply to all to whom it is redistributed without the need to agree to any additional legal terms. |
Application for any purpose | The license must allow use, redistribution, modification, and compilation for any purpose. The license must not restrict anyone from making use of the work in a specific field of endeavour. |
No charge | The license must not impose any fee arrangement, royalty, or other compensation or monetary remuneration as part of its conditions. |
…and the following acceptable conditions:
Acceptable conditions | Content |
Attribution | The license may require distributions of the work to include attribution of contributors, rights holders, sponsors, and creators as long as any such prescriptions are not onerous. |
Integrity | The license may require that modified versions of a licensed work carry a different name or version number from the original work or otherwise indicate what changes have been made. |
Share-alike | The license may require distributions of the work to remain under the same license or a similar license. |
Notice | The license may require retention of copyright notices and identification of the license. |
Source | The license may require that anyone distributing the work provide recipients with access to the preferred form for making modifications. |
Technical restriction prohibition | The license may require that distributions of the work remain free of any technical measures that would restrict the exercise of otherwise allowed rights. |
Non-aggression | The license may require modifiers to grant the public additional permissions (for example, patent licenses) as required for exercise of the rights allowed by the license. The license may also condition permissions on not aggressing against licensees with respect to exercising any allowed right (again, for example, patent litigation). |
What are the current trends in open data?
One concurrent trend in digital business is the trend towards providing services based on subscription fees instead of single purchases. Examples include cloud services (such as OneDrive, iCloud, Google Drive) and streaming of video and audio content (such as Netflix, Spotify). This trend affects directly how data is processed and consumed in business as well, we have seen for instance services like Azure and AWS, which are widely used.
The trend of providing data as a service has a profound impact on how data is shared and licensed between entities. Data is no longer seen as a strange cousin of intellectual property, but instead, from the perspective of the end-user, as something that adds value to the end-user’s activities. Data forms part of the service; it is not a single licensable asset. This trend can be seen in the latest context-specific open data licensing terms, in contrast with some more traditional forms of licensing stemming from licensing data embedded under specific legal forms of intellectual property rights.
The servitisation trend opens up new possibilities for building business upon data and highlights the need for novel business design and business models. It also emphasises the need for data sharing between entities and governance mechanisms for such data sharing, for instance through data ecosystems. The concept of open data ecosystems, focusing purely on open data, is currently an emerging concept. Provision of data as a service and emergence of data ecosystems demonstrate data as a resource with non-rival nature and focuses on how data can flow between entities.
Regulatory trends in open data
One of the forerunners in opening data come from actors in the field of public sector data, governmental data and open science. European regulation covering these fields has for some time paved the way towards offering data as a service. For example, the Open Data Directive (2019) emphasises dynamic data sharing and issues like data access and provision of application programming interfaces (APIs). APIs are required for access to high value data sets like geospatial data, environmental data and mobility data. Typically, these datasets contain data that is not covered by copyrights or database rights. This is another indication of the growing trend of servitisation of data as opposed to relying on mere licensing of the underlying intellectual property rights. Similar types of regulations can also be found in regulated sectors like banking, energy and the automotive industry.
From a regulatory perspective, data is currently in the spotlight. The so-called Big Five regulations (Digital Markets Act, Digital Services Act, Data Governance Act, Data Act and AI Act) are in different phases of adoption or implementation. The themes pushed forward with these acts include, among others, issues of fair competition in the digital markets, obligations for platforms and gatekeepers, obligations for service providers and data intermediaries, re-use of public sector data, data altruism, IoT data, data spaces and transparency and risk assessment of AI technology. These will profoundly affect the boundaries on how to conduct data-based business and give incentives for novel types of business in the European data economy.
The Big Five regulations will also give a boost for open data by affecting the business environment in which data flows in the continuum of the data spectrum, from closed to open, and different forms of data sharing in between. At the same time, the systemic challenges embedded in the data sharing and licensing between entities, and on a more general level within the ecosystems and communities, surface. These challenges contain issues like interoperability, APIs, access control, metadata, value generation, regulatory aspects, contractual issues, licensing, ecosystem governance and data governance. Due to the multidisciplinary nature of these challenges, they can be tackled only by addressing them concurrently from different perspectives, eg, technological, business, legal, operational and political.