de.NBI certification successful

Some research areas in the scope of the BioDATEN Science Data Center might require the handling of sensitive data, such as data with human origin. Thus, "management systems for technical data security" became necessary for the infrastructure providers of computational and storage resources. Due to the developments in the project "de.NBI Cloud" and in collaboration with bioinformaticians, the computing centers of the Universities Freiburg and Tübingen strived for the certification of the compute/storage infrastructures and the corresponding basic IT infrastructures. The complex and tedious certification process requires the commitment of various stakeholders. Fortunately, the de.NBI project allowed for a two-year grant to pursue an ISO 27001 certification. An ISO certificate gives further weight by the advocacy of various stakeholders in the university.

 The process itself consisted of various steps of preparational activities, the creation of policy documents, the documentation of the infrastructure and a two staged certification session involving all relevant departments of the computer centers. The final certificate will be awarded in January 2022. The certification process is a long-term process which has to be repeated in regular intervals. A certified infrastructure will help researchers in further domains beside BioDATEN to obtain funding which increasingly require such certifications especially in areas where sensitive data - patents, patient data, data with personal reference are dealt with.

(DvS, HG)

Discussion on research software started in BioDATEN and a SDC working group

Data generated in research projects must be preserved and replicable over defined periods of time which are, e.g., stated in institutional policies or funding conditions. But data itself is rendered meaningless without appropriate contexts and environment such as workflows, tools and software. These dependencies must be considered when discussing adherence to the FAIR principles. This becomes increasingly relevant as stated in the DFG's "Guidelines for Safeguarding Good Research Practice”.

Researchers want and need to work with instruments and software tools that correspond to or go beyond the state of the art. This means that they are working with tools that may disappear due to later market interventions (product or company policy, built-in obsolescence, expiry of hardware and software support) although they are still usable and in use. A later re-use becomes challenging if these tools need an environment that ages quickly and is no longer supported by the vendors. For example, in various laboratories there are still some Windows XP machines running that control measuring devices that are still useable, although the Windows XP support ended in 2014. This implies increasing risks in daily operation or increases the tendency to rely on tools whose sustainability is simply assumed by the market dominance of a company.

There are several points that should be discussed in the extended context of the Science Data Centers:

  • Should the access to certain types of data be part of the procurement process for software and measuring devices and thus be ensured?
  • How to deal with the "black box" phenomenon of (measuring) instruments (microscopes, NMR, ...) and software as it is unclear what exactly is happening internally and why, how output is generated from input. Changes to many proprietary software modules are often not transparent, which endangers the reproducibility of results.
  • Additionally, there might evolve "ethical-philosophical problems" - should the reproducibility of a result allowed to depend on an expensive license?
  • Is there a right to reproducibility by third parties (in the sense that no (expensive) license is necessary (as nothing new is achieved)?
  • How can this be legally organized (considering the freedom of research and teaching in many countries)?

A solution to some issues raised above might be the use of an appropriate Escrow service: That is the deposition of software at a certain place (e.g.: national library) with usage restrictions that allow only the reproduction of existing results and exclude conduction new research. 

Access to and control over research software becomes a building block of the digital sovereignty of a university or a field of science in general. But there is a long-term problem as it is unrealistic to keep software for 10 years or longer directly available or to anticipate the product policy of the manufacturer.

When looking at the increasing role of cloud use for applications and storage of data, the associated challenges should be considered.

In the end, recommendations should go much more in the direction of openness for research software. Here, researchers or funding agencies could make a similar demand as for the software procurement process - software must be generally disclosed for research data management reasons after, say, 10 years, in order to be allowed to be used at all.

In general, cooperation could help to steer the process into the desired direction: Joint influence on manufacturers may be more likely to bring about a change in product policies. Creation of corresponding "best practices" (e.g. as part of the requirements by funders) could also increase the pressure on companies to disclose software. These ideas and measures could result in some kind of certification of software and devices for fitness to be used in scientific workflows also in terms of compliance with the FAIR criteria.

(DvS, HG)

BioDATEN present at the 7th bwHPC Symposium

BioDATEN and all other SDCs in Baden-Württemberg (SDC4Lit, BERD, MoMaF) present their integration into the bwHPC project on the 7th bwHPC Symposium. For BioDATEN, the main focus is the connection to the cloud infrastructure BinAC. The basic idea is that researchers work on the BinAC and get a notification as soon as their computing jobs are completed. A set of (technical) metadata is automatically created and researchers can annotate these metadata by adding scientific metadata and descriptive metadata. In this approach, BioDATEN makes use of a direct connection to bwHPC / BinAC and thus is able to start with metadata annotation as soon as possible.


Sponsored by: