Recommendations on licenses for research data published

FAIR and open data requires proper measures on the publication and re-use of research data. Thus, the Science Data Center working group on legal aspects discussed the status in this domain and put together a couple of recommendations. As the practice in the involved scientific communities differ, the recommendations don´t necessarily apply to all domains. Nevertheless, the results are applicable to the BioDATEN community. The results of the discussion got published in the o|bib article "Offene Lizenzen für Forschungsdaten - Rechtliche Bewertung und Praxistauglichkeit verbreiteter Lizenzmodelle". As this open access online journal focuses on libraries, it is important to spread the word across domains.

The licensing of research data by granting rights to others determines whether self-generated data can be used by third-parties or whether third-party data can be used in one's own research. Admittedly, not all research data can be published right away. Some practical or legal circumstances can either lead to a delayed data publication, e.g. due to embargos, or to no publication at all. The EU Commission has aptly summarized this with the motto "as open as possible, as closed as necessary".  

Licenses for research data should consider the interests and requirements of infrastructure providers, which are indispensable especially as operators of data repositories, as well as those of the data creators and other stakeholders such as research and funding institution which might be captured in recommendations, policies or funding conditions. The legal safeguarding of open data spaces must be therefore geared to balance both sites. To aid in this regard, the published paper analyses the licensing recommendations of different stakeholders and examines the suitability of various licensing models for the provision of research data, before concluding with a differentiated recommendation of its own.

After thorough discussions and considerations, the paper's analysis comes to the following conclusion:

  1. In principle, the CC BY 4.0 license or the CC0 1.0 label is to be recommended for the provision of open research data.
  2. CC0 1.0 is preferable if it can be assumed that research data are not protected by copyright. This applies to metadata, but also to a large part of natural science and quantitative social science research data.

Nevertheless, individual cases may be good reasons to resort to more restrictive licenses or to refrain from publication. Choosing the “right” license must balance the needs of all stakeholders in research data management such as researchers, infrastructure providers, research institutions and funding agencies.

(DvS, HG)

Two nf-core events supported by BioDATEN

BioDATEN supports two nf-core events helt and organized by QBiC:

  1. nf-core bytesize talks: talks organized by the nf-core community on how to write Nextflow pipelines with the help of the nf-core tools. This series will also include information on currently available analysis pipelines as part of the nf-core project and how to use them. The talks from the previous series are available in our YouTube channel:
  2. nf-core hackathon: a developer event to contribute to the nf-core Nextflow pipelines. This year we will focus on converting existing pipelines to Nextflow DSL2


Science Data Center infrastructure: BioDATEN implements InvenioRDM for data publication

Jonathan Bauer from the University of Freiburg gave a presentation about the joined effort of establishing a publication repository for the BioDATEN community. The Invenio framework for data publication is arriving at its first release status. BioDATEN is getting into a partnership with the developers to push forward the productive version. Authentication is handled through KeyCloak and an object storage backend is offered through bwSFS. Currently a standalone instance is deployed but a Kubernetes-based orchestration solution is currently evaluated to cope with future usage load. Login is possible via ELIXIR ID authenticating against the identity provider of their home institutions. KeyCloak facilitates a role-based access control. A strategy for reproducible deployment, resilient operation and a disaster recovery is currently developed. Records in the repository can be individually controlled for data and metadata. An embargo or holding period can be applied to a record which is not publicly available until the embargo has ended. Special viewing and edit links can be generated and shared. A range of persistent identifiers can be applied, primarily support for the DataCite DOIs. Its possible to specify existing DOIs or mint new ones. Additionally, further identifiers can be specified like ARK, arXiv or handle. The integration with 3rd-party services is exposed through a REST API. This enables significant integration potential with other services like auto-updating of profiles with new publications, integration with the Science Gateway of BioDATEN for discovering or the publication of templates or for GitLab instances webhooks could be used to publish code releases.


(DvS, JK)

Sponsored by: