Organizational challenges - discussing a code of conduct for best practices in RDM The long-term nature of research data poses new challenges which needs to be addressed on the various institutional layers. The BioDATEN project supports the debate on and creation of suitable policies within the partners research institutions. The various stakeholders involved in research and data management are highly dynamic regarding their workplaces throughout their career. They might switch organizations multiple times before acquiring some permanent post or moving on. The people who created, collected, and processed data may have long since left the research institution. Projects may also have been officially finished a long time ago. Many projects also face questions about copyright, handling of personal data, and appropriate licenses for research data and related proprietary software. Responsibility over the data (and software) must therefore be exercised in a regulated manner at any point in the data life cycle by appropriate entities. During ongoing research compliant storage and processing in the context of the respective project needs to be considered. This must also be ensured later in the specific repositories of the BioDATEN communities, university publication services or dark archives. The potentially eternal storage of data after project completion is accompanied by responsibilities whose assignment must be clarified. Part of the handling can be derived from data management plans, other parts could be defined in the respective research data policies of the scientific institution. All considerations here are based on the DFG's "Guidelines for Safeguarding Good Scientific Practice". BioDATEN created in coordination with the Science Data Center steering group and the research data management working group guidelines for the responsible handling of research data (Leitfaden). This document got approved by the ALWR-BW and will be discussed in the libraries directors working group as well.

In a significantly networked and highly collaborative scientific field such as bioinformatics, the goal is to jointly use and federate services for data management. With the goal of a well acknowledged data publication in mind a persistent link between research objects and persons need to be established and maintained. Of central importance for this objective are persistent identifiers of researchers. Because of the high turnover within this group of individuals, agreement among all stakeholders in the science enterprise on a uniform, internationally recognized, and institution-wide system would be a considerable advantage, since changing workplaces between institutions would no longer require changes in the database. Such an internationally recognized identifier should be stable and unique for individuals. In a further step, BioDATEN has urged the SDC and Baden-Wuerttemberg RDM working groups to join the Memorandum of Understanding (MoU) of DINI and made a recommendation for ORCID. In this course it is advised to identify persons in research information systems, repositories and research data management via the ORCID ID and repositories via re3data (see the article at "Bausteine FDM"). The ORCID ID already has a high degree of visibility and acceptance in the bioinformatics community. The recommendation was favorably received by both the ALWR-BW and the Working Group of Library Directors in the state.

BioDATEN is one of four science data centres (SDC) sponsored by the Ministry of Science, Research and Arts. Although each project addresses a different scientific community, there are cross-cutting topics that must be handled by each project. Such topics are e.g. legal aspects of research data, defining metadata for research data, education on research data management, and business models for research infrastructure. As all SDCs will work in these areas and will benefit from each other’s experiences, the coordination project bw2fdm organizes and hosts various working groups to tackle these issues. BioDATEN is an active member in all working groups so far and is happy to share experiences and to network with the other SDCs. BioDATEN is also part of the spokesperson group of two working groups.

BioDATEN is actively pursuing the development of a working business model required for sustainable operation. During an SDC working group meeting, BioDATEN presented a short overview on the actual state, that put together the insights from its partners, previous projects, and. The base for sustainable operation is the cooperation and agreement among partners. This entails the possibility of exchanged services and resources. From a certain point on this needs to be put into formal agreements signed by the partners in a consortium and its users. A challenge is the creation and definition of the proper legal framework. Optimally, not every consortium needs to create its own framework and agreements but can join larger structures. Those structures still need to be established. This is getting clarified in ongoing talks with the Ministry of Science, Research and Arts.


In the course of another SDC working group meeting, BioDATEN presented its decision process over the last year towards a suitable repository service for data publication. It finally came up with InvenioRDM after the analysis of existing solutions like DataVerse, Fedora/DSpace or iRods. During the decision process, BioDATEN was in exchange with various experts from university libraries, computer centres within the state of Baden-Württemberg, and with experts from various NFDI consortia like DataPLANT. These activities identified the shortcomings of traditional archiv and library solutions regarding Big Data as required e.g. by bwSFS and the BioDATEN communities. InvenioRDM features a modern extensible architecture and has already a significant userbase within the Zenodo community. Thus, a high probability of sustainable and long-term service can be expected. Additionally, InvenioRDM is a modular, Python-based open source data repository software initially developed by CERN which provides a modern REST-API. The highly modular architecture is micro-service oriented, scalable, and easily extensible. It provides the necessary broad support for storage backends like S3 object storage, filesystem or WebDAV. Indexing and search is based on the Elastic Search backend. InvenioRDM allows for the simple integration of external identity providers via oAuth 2.0/OpenID Connect and exposes an OAI-PMH interface for metadata harvesting and data exchange with other repository services.


