BioDATEN participating in the state wide RDM WG meeting (12.05.)

Representatives of the Science Data Center BioDATEN took part in the RDM Working Group (AK FDM) online meeting on 12th May 2020 and contributed to the discussion both on quality assurance and governance of research data.
The long-term character of research data and the various parties involved in the process create new challenges in research organisations with a high level of fluctuation in personnel. The researchers who created, collected and processed data may have left the research institution and projects were officially completed long ago. Many projects are also confronted with questions regarding copyright, handling of personal data and appropriate licenses for research data. Responsibility for the data must therefore be exercised in a regulated manner at any point in the data lifecycle by appropriate entities. After using service providers for storing and processing data during the project period, other entities can get involved after the project ends like subject-specific repositories or university publication services such as dark archives. The potentially perpetual storage of data after project completion is accompanied by responsibilities whose assignment should be clarified. Part of the handling can be derived from data management plans, other parts could be defined in research data policies of the scientific institution.

Research data management also affects aspects of good scientific practice. With the objective of achieving the greatest possible transparency and openness while at the same time considering the prudent handling of sensitive data, this requires clear regulations and agreements. For example, questions of copyright and the handling of personal data should already be clarified at the start of a research project or project in general. The same goes for the granting of suitable licenses for research data taken over by storage facilities, like repositories or dark archives, or data generated by researchers.
BioDATEN as a science data center should therefore help it's community by suggesting guidelines and promote their implementation. These include the question of data release and regulations comparable to publication contracts with the University Library. Increasingly, an (institutionalised) RDM is becoming a prerequisite for the approval of grants for new research projects. In order to ensure that this does not remain a one-sided process, both individual researchers and the research institutions and its entities, such as research groups, institutes or CRCs, must be included in the governance considerations for the RDM.

In the following, a document with guidelines for practical aspects of the RDM will be compiled for individual professorships and individual working groups as well as for larger collaborative projects. These should contain clear regulations on costs and responsibility including the assignment of rights to the storage facility in cases such as data migration and deletion. Early planning is facilitated by filling out Data Management Plans. This makes it possible to plan costs and expenses for the infrastructure providers involved.


BioDATEN joined three working groups for cross-cutting topics

The Science Data Center BioDATEN is actively involved in the SDC's cross-cutting activities in Baden-Württemberg coordinated by the bw2FDM project. On Wednesday, 2nd April three online meetings were held which were dedicated to the topics of legal support, business models, and meta data. BioDATEN was present both in the role as contributor and consumer through various participants from Tübingen and Freiburg. In the discussion of the group, possible objectives of the legal working group got outlined, ranging from the creation of a discussion platform and a network for recommendations of action to the preparation of workshops, trainings, FAQs, checklists, handouts, knowledge bases, and model contracts. Further on, the activities in other states with regard to legal issues in RDM will be monitored. Of course, general advice with regard to legal issues in specific individual cases is problematic. The individual research institutions are primarily responsible to provide this. The working group reiterated the importance of cooperation of the technical and the legal sides. The next meeting will primarily deal with the legal implications of data sharing and reusing.

Business and operation models are definitely a relevant cross-cutting topic as many state sponsored projects and state-wide services share the same set of challenges including sustainable financing and the assignment of required personnel on non-existing permanent positions.
The BioDATEN project is aware of this as in the precursor ViCE projects which have also addresses aspects on operation and business models.
Theses aspects have already been elaborated but not solved. A certain base might be provided by the already existing cooperation framework existing in today's regulations (check for suitable legal entities and cooperation frameworks). Another option is the tight cooperation and coordination with ongoing NFDI activities like the DataPLANT NFDI for fundamental plant research for parts of the BioDATEN communities. For sustainable operation qualified staff plays a key role as well as the provisioning of storage systems for the data.

The third online meeting addressed the cross-cutting topic of metadata. Metadata are very important throughout the complete lifecycle of research data. While there are basic, descriptive metadata that are very likely to be shared across communities like author, DOI, affiliation etc. there are also very domain and research specific metadata like species, treatment, experimental design etc. Descriptive metadata are best handled by using established standards like DataCite. Research specific metadata are probably best coped with by using existing ontologies. Researches can than pick the label that describes their data best. Additionally, there are technical and process related metadata. To a large extend, both can be collected automatically during the data processing on HPC resources or by tools like FITS. The well established PREMIS framework is a good solution to capture technical metadata.
These four kinds of metadata must be containerised into one file. The METS framework offers such a container and is also well established.
A special role plays the usage of ORCID as a method to reference authors, contributors etc. Unlike institution-based identity management, ORCID is person-based and every researcher can update their affiliation and contact information. This comes in handy as soon as researchers leave their home institution and are no longer reachable under the institution’s email address.


New workshop on research data management

The Science Data Center BioDATEN is getting more and more involved in outreach and training activities in research data management. Since (junior) scientists are usually actively involved in research projects, corresponding courses should take place at the beginning of the research project if possible, so that the life cycle of the research data can be covered almost completely. Thus, the computing center of the Freiburg University is planning a full-day workshop for the CIBBS research group, a research cluster granted in the German university excellency initiative and for the bio-chemistry research training group on 25th April in Freiburg. The teaching and qualification measures are geared towards junior researchers at the beginning of their career with the aim to work towards a well thought-out and structured data and workflow preparation. The workshop follows the idea to avoid time consuming and error-prone end-of-project data management by providing advice to researchers at an early stage in the life cycle of data. Data sets should be curated with the vision of later publication from the beginning, enriched with proper metadata and converted into sustainable file formats. Junior scientists need an increasing amount of qualified knowledge to access the various advanced research infrastructures and to properly handle the associated data management.

The workshop as planned will be a combined information and training event to give an introduction to domain-specific research data management focusing on the cluster of excellence at the university of Freiburg (CIBBS) and further groups from the bioinformatics and bio-chemistry domains. They offer concrete instructions in day-to-day activities. Researchers are introduced to methodological, organizational, technical and legal challenges of research data management on the one hand, and on the other hand specific requests of the working group are dealt with. Further topics covered will encompass: Presentation of the respective research approaches and the associated RDM, learn from current procedures and workflows as examples (best practices) for the own work, learn of available and planned infrastructures and future developments in BioDATEN, creation and use of ORCID IDs, tools and methods for data management (hands-on; e.g. use of restic, rucio).

Further (outreach) events focus on groups like the Bioinformatics Club of senior PIs at the university of Freiburg. The presentation planned will give an overview on the ongoing activities and developments in BioDATEN and training and summer school activities of Galaxy and ELIXIR/EOSC.
Beside practical aspects of data management the outreach event will give an introduction to the planned bwSFS (Storage-for-Science) system as a future base for data storage, long-term archiving and publication. It will advocate for concepts like the ORCID ID to improve descriptive metadata for data publication. The qualification of PIs is to be regarded as particularly important, since these generate on the one hand a lot of data and are responsible for the reusability of these. On the other hand, these senior researchers in their role as supervisors for students and doctoral students serve as an example and should therefore adopt a sustainable approach to research data, for which BioDATEN will provide the necessary framework.

Sponsored by: