New workshop on research data management

The Science Data Center BioDATEN is getting more and more involved in outreach and training activities in research data management. Since (junior) scientists are usually actively involved in research projects, corresponding courses should take place at the beginning of the research project if possible, so that the life cycle of the research data can be covered almost completely. Thus, the computing center of the Freiburg University is planning a full-day workshop for the CIBBS research group, a research cluster granted in the German university excellency initiative and for the bio-chemistry research training group on 25th April in Freiburg. The teaching and qualification measures are geared towards junior researchers at the beginning of their career with the aim to work towards a well thought-out and structured data and workflow preparation. The workshop follows the idea to avoid time consuming and error-prone end-of-project data management by providing advice to researchers at an early stage in the life cycle of data. Data sets should be curated with the vision of later publication from the beginning, enriched with proper metadata and converted into sustainable file formats. Junior scientists need an increasing amount of qualified knowledge to access the various advanced research infrastructures and to properly handle the associated data management.

The workshop as planned will be a combined information and training event to give an introduction to domain-specific research data management focusing on the cluster of excellence at the university of Freiburg (CIBBS) and further groups from the bioinformatics and bio-chemistry domains. They offer concrete instructions in day-to-day activities. Researchers are introduced to methodological, organizational, technical and legal challenges of research data management on the one hand, and on the other hand specific requests of the working group are dealt with. Further topics covered will encompass: Presentation of the respective research approaches and the associated RDM, learn from current procedures and workflows as examples (best practices) for the own work, learn of available and planned infrastructures and future developments in BioDATEN, creation and use of ORCID IDs, tools and methods for data management (hands-on; e.g. use of restic, rucio).

Further (outreach) events focus on groups like the Bioinformatics Club of senior PIs at the university of Freiburg. The presentation planned will give an overview on the ongoing activities and developments in BioDATEN and training and summer school activities of Galaxy and ELIXIR/EOSC.
Beside practical aspects of data management the outreach event will give an introduction to the planned bwSFS (Storage-for-Science) system as a future base for data storage, long-term archiving and publication. It will advocate for concepts like the ORCID ID to improve descriptive metadata for data publication. The qualification of PIs is to be regarded as particularly important, since these generate on the one hand a lot of data and are responsible for the reusability of these. On the other hand, these senior researchers in their role as supervisors for students and doctoral students serve as an example and should therefore adopt a sustainable approach to research data, for which BioDATEN will provide the necessary framework.

BioDATEN at 3rd NFDI Community Workshop at LMU Munich

BioDATEN was present at the 3rd NFDI Community Workshop - Services for Research Data Management in Neuroscience - hosted at the Ludwig Maximilians University in Munich on 10th February. It brought in the base level services provider perspective and its expertise in the NFDI forming process. There were further services providers presenting at the workshop like the FIZ Karlsruhe and infrastructure providers in context of the Human Brain Project. These talks were followed by lightening talks on a range of services, initiatives and data providers like Fenix-RI, GIN, Cellular-resolution brain map, Zeiss research platform, NFDI4BMP, BRIDGE4NFDI, Helmholtz Metadata Services, RDM Services at Marburg University, LMU Munich University Library, Leibniz Rechenzentrum Munich, Gesellschaft für wissenschaftliche Datenverarbeitung Göttingen, Max Planck Computing and Data Facility on the European Open Science Cloud.

The provider's perspective in the NFDI forming process was elaborated by the BioDATEN science data center. Providers of research IT infrastructures are faced by significant technological changes especially fostered by resource virtualization. Many of the modern services and workflows are operated in an increasingly cloud-like fashion where data and computing move from independent, personal workstations to centralized, aggregated resources. Such shared resources allow to host new projects faster. The necessary excess capacity is much easier to maintain and to justify in centralized resources because the necessary shared overhead is typically much less than in independent systems. Grant providers start to understand the changed technological landscape and start to adopt their funding schemes allowing to buy-in into existing resources preferred to establish single ones per new project. Users are faced with a difficult sizing challenge as it is often impossible for them to define the “right” configuration for a required resource. These challenges are answered by the IT industry as well as by science driven cloud and aggregated HPC offerings. The aggregation of resources into larger infrastructure allows to focus on the increasing efforts in market analysis, system selection, proper procurement and operation of (large scale) IT infrastructures, which can be done by a few experts. Further on, such a strategy would eliminate the contradiction of typical project run times versus the (significant) delay for equipment provisioning and the usual write down time spans of that equipment.

The massive changes in the IT landscape and user expectations increase the pressure for re-orientation of university (scientific) compute centers. Cooperation in scientific cloud infrastructure is the chance for many compute centers to significantly widen their scope of IT services. It helps to keep up with the demand by the scientific communities and to offer a quite complete service portfolio. Organizationally, it allows for specialization and community focus. When defining future strategies and operation models, compute centers might find a new standing in supporting research data management by providing efficient infrastructure and consultation to the various scientific communities. Furthermore, it offers them the opportunity to participate in infrastructure calls. These developments enable researchers to offload non-domain specific tasks and services on to infrastructure provides. Suitable governance structures are to be implemented to ensure a persistent relevance of computer centers through user participation and feedback loops in the future. Close cooperation and consultation (like already done in Freiburg for the bwForCluster NEMO and for the storage infrastructure bwSFS) helps all stakeholders to have suitable and up to date infrastructures tailored to their needs. Such structures are in their infancy for the NFDI, but future NFDI wide coordination should advance this topic.

The financing of IT infrastructures for the various scientific communities is often grant driven and inherently not suitable for providing sustainable long-term services and research data management. The future should see a changed flow of funding from a simple project-driven and organization centered practice to demand-driven streams to different infrastructure and service providers. Large infrastructure initiatives like de.NBI or the NFDI need not only to solve the role of personnel employment (permanent vs. project based) but also to define suitable business and operation models compatible with the VAT regulations and the federal and state requirements for cash flows in mixed consortia.

nestor Access Workshop in Fulda

As a member institution of nestor the University of Freiburg participated in behalf of BioDATEN in the nestor-internal two days "Workshop on Access". The workshop intends to help to foster the qualification of partners and clarify questions on digital preservation infrastructures and access systems. Use cases and significant aspects of the user perspectives are discussed following the topics of architecture and conceptional design, aspects of the use of "virtual reading rooms", support and consultation needs on historical digital materials, next generation use like big data and data mining scenarios. Further on the OAIS offers a conceptional framework to receive and store objects differently to presenting them e.g. driven by retention periods. Standardization beside legal aspects are topics to be taken into account as well.

Presentations to kick-start discussions and exchange were given by nestor partners like from the Leibniz Institute for German Language, the German Central Library for Medicine and the State Archive of Northrheine-Westfalia. Invited talks were delivered by Thomas Ledoux from the Bibliothèque Nationale die France: "An experienced practitioner’s view on Access (library perspective)" and Nicola Wissbrock and Sarra Hamdi, The National Archives UK: "An experienced practitioner’s view on Access (archival perspective)". The second day handled topics like "Access Rights Information in the SLUB digital long-term archive" presented by the Saxonian State and University Library Dresden (SLUB), which gave a couple of recommendations on how to pragmatically deal with copyright, usage and access statements for digital objects. The following talks covered "What is new? Changes in OAIS relating to Access" by Fernuniversität Hagen, "Chances and risks of dark archives" by the Technische Informationsbibliothek. The day got conluded by a team member of the CiTAR and BioDATEN long-term access team from the University of Freiburg on "Emulation as a Service". A long-term access module will get included as a module in the BioDATEN science gateway.

Sponsored by: