Discussion on research software started in BioDATEN and a SDC working group

Data generated in research projects must be preserved and replicable over defined periods of time which are, e.g., stated in institutional policies or funding conditions. But data itself is rendered meaningless without appropriate contexts and environment such as workflows, tools and software. These dependencies must be considered when discussing adherence to the FAIR principles. This becomes increasingly relevant as stated in the DFG's "Guidelines for Safeguarding Good Research Practice”.

Researchers want and need to work with instruments and software tools that correspond to or go beyond the state of the art. This means that they are working with tools that may disappear due to later market interventions (product or company policy, built-in obsolescence, expiry of hardware and software support) although they are still usable and in use. A later re-use becomes challenging if these tools need an environment that ages quickly and is no longer supported by the vendors. For example, in various laboratories there are still some Windows XP machines running that control measuring devices that are still useable, although the Windows XP support ended in 2014. This implies increasing risks in daily operation or increases the tendency to rely on tools whose sustainability is simply assumed by the market dominance of a company.

There are several points that should be discussed in the extended context of the Science Data Centers:

  • Should the access to certain types of data be part of the procurement process for software and measuring devices and thus be ensured?
  • How to deal with the "black box" phenomenon of (measuring) instruments (microscopes, NMR, ...) and software as it is unclear what exactly is happening internally and why, how output is generated from input. Changes to many proprietary software modules are often not transparent, which endangers the reproducibility of results.
  • Additionally, there might evolve "ethical-philosophical problems" - should the reproducibility of a result allowed to depend on an expensive license?
  • Is there a right to reproducibility by third parties (in the sense that no (expensive) license is necessary (as nothing new is achieved)?
  • How can this be legally organized (considering the freedom of research and teaching in many countries)?

A solution to some issues raised above might be the use of an appropriate Escrow service: That is the deposition of software at a certain place (e.g.: national library) with usage restrictions that allow only the reproduction of existing results and exclude conduction new research. 

Access to and control over research software becomes a building block of the digital sovereignty of a university or a field of science in general. But there is a long-term problem as it is unrealistic to keep software for 10 years or longer directly available or to anticipate the product policy of the manufacturer.

When looking at the increasing role of cloud use for applications and storage of data, the associated challenges should be considered.

In the end, recommendations should go much more in the direction of openness for research software. Here, researchers or funding agencies could make a similar demand as for the software procurement process - software must be generally disclosed for research data management reasons after, say, 10 years, in order to be allowed to be used at all.

In general, cooperation could help to steer the process into the desired direction: Joint influence on manufacturers may be more likely to bring about a change in product policies. Creation of corresponding "best practices" (e.g. as part of the requirements by funders) could also increase the pressure on companies to disclose software. These ideas and measures could result in some kind of certification of software and devices for fitness to be used in scientific workflows also in terms of compliance with the FAIR criteria.

(DvS, HG)

BioDATEN present at the 7th bwHPC Symposium

BioDATEN and all other SDCs in Baden-Württemberg (SDC4Lit, BERD, MoMaF) present their integration into the bwHPC project on the 7th bwHPC Symposium. For BioDATEN, the main focus is the connection to the cloud infrastructure BinAC. The basic idea is that researchers work on the BinAC and get a notification as soon as their computing jobs are completed. A set of (technical) metadata is automatically created and researchers can annotate these metadata by adding scientific metadata and descriptive metadata. In this approach, BioDATEN makes use of a direct connection to bwHPC / BinAC and thus is able to start with metadata annotation as soon as possible.


Recommendations on licenses for research data published

FAIR and open data requires proper measures on the publication and re-use of research data. Thus, the Science Data Center working group on legal aspects discussed the status in this domain and put together a couple of recommendations. As the practice in the involved scientific communities differ, the recommendations don´t necessarily apply to all domains. Nevertheless, the results are applicable to the BioDATEN community. The results of the discussion got published in the o|bib article "Offene Lizenzen für Forschungsdaten - Rechtliche Bewertung und Praxistauglichkeit verbreiteter Lizenzmodelle". As this open access online journal focuses on libraries, it is important to spread the word across domains.

The licensing of research data by granting rights to others determines whether self-generated data can be used by third-parties or whether third-party data can be used in one's own research. Admittedly, not all research data can be published right away. Some practical or legal circumstances can either lead to a delayed data publication, e.g. due to embargos, or to no publication at all. The EU Commission has aptly summarized this with the motto "as open as possible, as closed as necessary".  

Licenses for research data should consider the interests and requirements of infrastructure providers, which are indispensable especially as operators of data repositories, as well as those of the data creators and other stakeholders such as research and funding institution which might be captured in recommendations, policies or funding conditions. The legal safeguarding of open data spaces must be therefore geared to balance both sites. To aid in this regard, the published paper analyses the licensing recommendations of different stakeholders and examines the suitability of various licensing models for the provision of research data, before concluding with a differentiated recommendation of its own.

After thorough discussions and considerations, the paper's analysis comes to the following conclusion:

  1. In principle, the CC BY 4.0 license or the CC0 1.0 label is to be recommended for the provision of open research data.
  2. CC0 1.0 is preferable if it can be assumed that research data are not protected by copyright. This applies to metadata, but also to a large part of natural science and quantitative social science research data.

Nevertheless, individual cases may be good reasons to resort to more restrictive licenses or to refrain from publication. Choosing the “right” license must balance the needs of all stakeholders in research data management such as researchers, infrastructure providers, research institutions and funding agencies.

(DvS, HG)

Sponsored by: