BioDATEN joined three working groups for cross-cutting topics

The Science Data Center BioDATEN is actively involved in the SDC's cross-cutting activities in Baden-Württemberg coordinated by the bw2FDM project. On Wednesday, 2nd April three online meetings were held which were dedicated to the topics of legal support, business models, and meta data. BioDATEN was present both in the role as contributor and consumer through various participants from Tübingen and Freiburg. In the discussion of the group, possible objectives of the legal working group got outlined, ranging from the creation of a discussion platform and a network for recommendations of action to the preparation of workshops, trainings, FAQs, checklists, handouts, knowledge bases, and model contracts. Further on, the activities in other states with regard to legal issues in RDM will be monitored. Of course, general advice with regard to legal issues in specific individual cases is problematic. The individual research institutions are primarily responsible to provide this. The working group reiterated the importance of cooperation of the technical and the legal sides. The next meeting will primarily deal with the legal implications of data sharing and reusing.

Business and operation models are definitely a relevant cross-cutting topic as many state sponsored projects and state-wide services share the same set of challenges including sustainable financing and the assignment of required personnel on non-existing permanent positions.
The BioDATEN project is aware of this as in the precursor ViCE projects which have also addresses aspects on operation and business models.
Theses aspects have already been elaborated but not solved. A certain base might be provided by the already existing cooperation framework existing in today's regulations (check for suitable legal entities and cooperation frameworks). Another option is the tight cooperation and coordination with ongoing NFDI activities like the DataPLANT NFDI for fundamental plant research for parts of the BioDATEN communities. For sustainable operation qualified staff plays a key role as well as the provisioning of storage systems for the data.

The third online meeting addressed the cross-cutting topic of metadata. Metadata are very important throughout the complete lifecycle of research data. While there are basic, descriptive metadata that are very likely to be shared across communities like author, DOI, affiliation etc. there are also very domain and research specific metadata like species, treatment, experimental design etc. Descriptive metadata are best handled by using established standards like DataCite. Research specific metadata are probably best coped with by using existing ontologies. Researches can than pick the label that describes their data best. Additionally, there are technical and process related metadata. To a large extend, both can be collected automatically during the data processing on HPC resources or by tools like FITS. The well established PREMIS framework is a good solution to capture technical metadata.
These four kinds of metadata must be containerised into one file. The METS framework offers such a container and is also well established.
A special role plays the usage of ORCID as a method to reference authors, contributors etc. Unlike institution-based identity management, ORCID is person-based and every researcher can update their affiliation and contact information. This comes in handy as soon as researchers leave their home institution and are no longer reachable under the institution’s email address.


New workshop on research data management

The Science Data Center BioDATEN is getting more and more involved in outreach and training activities in research data management. Since (junior) scientists are usually actively involved in research projects, corresponding courses should take place at the beginning of the research project if possible, so that the life cycle of the research data can be covered almost completely. Thus, the computing center of the Freiburg University is planning a full-day workshop for the CIBBS research group, a research cluster granted in the German university excellency initiative and for the bio-chemistry research training group on 25th April in Freiburg. The teaching and qualification measures are geared towards junior researchers at the beginning of their career with the aim to work towards a well thought-out and structured data and workflow preparation. The workshop follows the idea to avoid time consuming and error-prone end-of-project data management by providing advice to researchers at an early stage in the life cycle of data. Data sets should be curated with the vision of later publication from the beginning, enriched with proper metadata and converted into sustainable file formats. Junior scientists need an increasing amount of qualified knowledge to access the various advanced research infrastructures and to properly handle the associated data management.

The workshop as planned will be a combined information and training event to give an introduction to domain-specific research data management focusing on the cluster of excellence at the university of Freiburg (CIBBS) and further groups from the bioinformatics and bio-chemistry domains. They offer concrete instructions in day-to-day activities. Researchers are introduced to methodological, organizational, technical and legal challenges of research data management on the one hand, and on the other hand specific requests of the working group are dealt with. Further topics covered will encompass: Presentation of the respective research approaches and the associated RDM, learn from current procedures and workflows as examples (best practices) for the own work, learn of available and planned infrastructures and future developments in BioDATEN, creation and use of ORCID IDs, tools and methods for data management (hands-on; e.g. use of restic, rucio).

Further (outreach) events focus on groups like the Bioinformatics Club of senior PIs at the university of Freiburg. The presentation planned will give an overview on the ongoing activities and developments in BioDATEN and training and summer school activities of Galaxy and ELIXIR/EOSC.
Beside practical aspects of data management the outreach event will give an introduction to the planned bwSFS (Storage-for-Science) system as a future base for data storage, long-term archiving and publication. It will advocate for concepts like the ORCID ID to improve descriptive metadata for data publication. The qualification of PIs is to be regarded as particularly important, since these generate on the one hand a lot of data and are responsible for the reusability of these. On the other hand, these senior researchers in their role as supervisors for students and doctoral students serve as an example and should therefore adopt a sustainable approach to research data, for which BioDATEN will provide the necessary framework.

BioDATEN at 3rd NFDI Community Workshop at LMU Munich

BioDATEN was present at the 3rd NFDI Community Workshop - Services for Research Data Management in Neuroscience - hosted at the Ludwig Maximilians University in Munich on 10th February. It brought in the base level services provider perspective and its expertise in the NFDI forming process. There were further services providers presenting at the workshop like the FIZ Karlsruhe and infrastructure providers in context of the Human Brain Project. These talks were followed by lightening talks on a range of services, initiatives and data providers like Fenix-RI, GIN, Cellular-resolution brain map, Zeiss research platform, NFDI4BMP, BRIDGE4NFDI, Helmholtz Metadata Services, RDM Services at Marburg University, LMU Munich University Library, Leibniz Rechenzentrum Munich, Gesellschaft für wissenschaftliche Datenverarbeitung Göttingen, Max Planck Computing and Data Facility on the European Open Science Cloud.

The provider's perspective in the NFDI forming process was elaborated by the BioDATEN science data center. Providers of research IT infrastructures are faced by significant technological changes especially fostered by resource virtualization. Many of the modern services and workflows are operated in an increasingly cloud-like fashion where data and computing move from independent, personal workstations to centralized, aggregated resources. Such shared resources allow to host new projects faster. The necessary excess capacity is much easier to maintain and to justify in centralized resources because the necessary shared overhead is typically much less than in independent systems. Grant providers start to understand the changed technological landscape and start to adopt their funding schemes allowing to buy-in into existing resources preferred to establish single ones per new project. Users are faced with a difficult sizing challenge as it is often impossible for them to define the “right” configuration for a required resource. These challenges are answered by the IT industry as well as by science driven cloud and aggregated HPC offerings. The aggregation of resources into larger infrastructure allows to focus on the increasing efforts in market analysis, system selection, proper procurement and operation of (large scale) IT infrastructures, which can be done by a few experts. Further on, such a strategy would eliminate the contradiction of typical project run times versus the (significant) delay for equipment provisioning and the usual write down time spans of that equipment.

The massive changes in the IT landscape and user expectations increase the pressure for re-orientation of university (scientific) compute centers. Cooperation in scientific cloud infrastructure is the chance for many compute centers to significantly widen their scope of IT services. It helps to keep up with the demand by the scientific communities and to offer a quite complete service portfolio. Organizationally, it allows for specialization and community focus. When defining future strategies and operation models, compute centers might find a new standing in supporting research data management by providing efficient infrastructure and consultation to the various scientific communities. Furthermore, it offers them the opportunity to participate in infrastructure calls. These developments enable researchers to offload non-domain specific tasks and services on to infrastructure provides. Suitable governance structures are to be implemented to ensure a persistent relevance of computer centers through user participation and feedback loops in the future. Close cooperation and consultation (like already done in Freiburg for the bwForCluster NEMO and for the storage infrastructure bwSFS) helps all stakeholders to have suitable and up to date infrastructures tailored to their needs. Such structures are in their infancy for the NFDI, but future NFDI wide coordination should advance this topic.

The financing of IT infrastructures for the various scientific communities is often grant driven and inherently not suitable for providing sustainable long-term services and research data management. The future should see a changed flow of funding from a simple project-driven and organization centered practice to demand-driven streams to different infrastructure and service providers. Large infrastructure initiatives like de.NBI or the NFDI need not only to solve the role of personnel employment (permanent vs. project based) but also to define suitable business and operation models compatible with the VAT regulations and the federal and state requirements for cash flows in mixed consortia.

Sponsored by: