News

BioDATEN in Strasbourg in EUCOR networking

The science data center BioDATEN exploited the chance for networking in the European Campus (EUCOR) setting. The session took place in Strasbourg on the 27th November to exchange on the current state on research data management and possible future joint activities of the member universities. BioDATEN and its concept on RDM as well as other member's actions were presented in the first half of the session. There are different forms of researchers' support provided by local research data management groups or facilities ranging from information portals to personal advice in grant application. The concept of data stewards as specialized personnel bridging the gap between the researchers and the RDM practice becomes more and more common. Some ideas got developed to setup special training courses and study tracks within the EUCOR context to qualify students in that domain.

The RDM landscape is quite diverse and in different levels of maturity.
Many of the member universities started with education programmes on a general level or more specifically on data management plans. A major focus of the EUCOR activities are on training as the coordination overhead is the lowest. As there is quite some demand especially on the PhD level the idea came up to coordinate in domain specific courses within the EUCOR setting. Then not every institution has to provide expertise on all scientific domains but can specialize. Here the BioDATEN education and qualification activities could fit in very well to support the bioinformatics community to a wider degree, which would be perfectly in line with the SDC's objectives. Frameworks exist within EUCOR allowing university members to attend courses in other locations.

France has a more centralized approach on its universities and thus pushes ORCID as a national standard on persistent person identifiers (a subject evaluated in BioDATEN in work package 1.2.4). Further on exchange is planned both on ideas on technical systems for storage, suitable repository software, data visualization, governance, data stewardship and policies.

First qualification course on Research Data Management Plans

The qualification activities of the BioDATEN SDC started with a course on the concept and the creation of Data Management Plans: "Introduction to Research Data Management – Data Management Plans". The course held at the Computer Center in Freiburg University was jointly organized by the Freiburg Research Data Management Group and the Science Data Center BioDATEN. The course gave a brief overview of the handling of research data and different aspects of the creation of research data management plans over the complete life cycle of the respective data. The course presented concepts and strategies for the creation of DMPs in research projects to ensure the long-term re-usability and accessibility of electronic research data, including functional issues, documentation and appropriate enrichment with metadata. Furthermore, the areas of planning archiving strategies, issues on sensitive data and, last but not least, cost and refinancing models were discussed. Recommendations for Data Management Plans were given on the structure, necessary elements ranging from standard project metadata to considerations on amount of data sets, file types and software involved. Further relevant points were data and software licensing and special considerations were dedicated on data citation which might become part of future credits for research.

The course was jointly organized by the Science Data Center BioDATEN, the Freiburg Research Data Management Group represented by the eScience groups of the Computer Center and the university library. Another workshop is planned due to the high demand, as well as a broadening of the range of training courses for subject-specific questions, as is increasingly the case due to the requirements of the various research groups in the bioinformatics community.

BioDATEN present at the Research Data Management Working Group at the Stuttgart university library

The Science Data Centre BioDATEN was present at the Research Data Management Working Group (AK FDM) meeting at the Stuttgart university library to discuss the challenges of long-term identifiers. Research data management (RDM) needs to deal with objects which outlive the typical period of a project and the persons working on a research question. The identification of people needs to consider the dynamics of research, the fluctuation of researchers and ongoing technological change. These challenges need to be considered in the context of data repositories and the Science Data Centres (SDC). The person identifiers are required for information discovery on research, scholarly information and to credit scientists for the results of their research, ideally both for publication of articles and data.

The referencing of scholarly publication and sharing of data sets depends on long-term durability of identifiers. The short-term affiliation of a person with a university or a research institution contrasts this endeavour if the identity management rests upon local system or the email addresses. Thus, there is the need for stable and unambiguous references for people. Objects like papers, research data, lab notebooks and source code should be linked to one or more researchers in a proper and stable way. Ideally, the references are machine actionable. Names are problematic as they are not unique, the spelling might be incorrect and bibliographical references might vary. Researchers move around, graduate, start a PhD at a different institution or change names. Additional identifiers like email, name of the institution and department are not necessarily stable as well. The institutions’ identity management systems disambiguate but their core focus is on active members of the institution and identities are getting disabled or deleted if a person retires or moves on.

The BioDATEN project addresses the topic in the work package "1.2.4 Persistent Identifier". As the challenge is around for a while and is faced by a wide range of institutions and publishers, a couple of options are available. The "Gemeinsame Normdatei" (GND, Integrated Authority File) is such an approach for German speaking countries to a unique identifier that facilitates the collaborative use and administration of authority data. The GND represents and describes entities, i.e. persons, corporate bodies, conferences and events, geographic entities, topics as well as work relating to cultural and academic collections. The GND is primarily used by libraries to catalogue publications, but other institutions like archives, museums, cultural and (academic) institutions use GND as well. Publishers employ for example the ResearcherID which is part of the "Web of Science". It provides identifiers - the Web of Science ResearcherIDs - which are used by institutions and funders as a persistent identifier to track research output and to update publication records in Web of Science, ensuring correct author attribution and disambiguation. Another commercial approach is the Scopus Author Identifier, a proprietary solution provided by Elsevier. For identity federations, the edu-ID plays a role. Activities concerning edu-ID in Germany are coordinated by a ZKI working group. It aims for a livelong, user-centric digital identity for research and education. There are different implementations in various federations: SUNET provides eduID.se in Sweden for application and matriculation processes in higher education as well as for the creation of university user IDs. The Swiss edu-ID is the common digital identity for the academic sector.

ORCID, a novel approach to user-centric global person identifiers, started in 2010/11 by a Mellon grant and several other sponsors. It transcends discipline, geographic, national and institutional boundaries. Participation in ORCID is open to any organization that has an interest in scholarly communications. The access to ORCID services is based on transparent and non-discriminatory terms. Institutions can become members for an annual service fee, e.g. the University of Freiburg joined ORCID in February 2018. ORCID is governed by representatives from a broad cross-section of stakeholders which are by the majority not-for-profit. All software developed by ORCID will be released under a license approved by the Open Source Initiative. Researchers can create, edit, and maintain an ORCID ID and profile free of charge. Researchers control the defined privacy settings of their own ORCID. All profile data contributed to ORCID by researchers or claimed by them will be available in standard formats for free download. The availability of data is subject to the researchers' own privacy settings. ORCID identifiers and profile data are made available via free APIs and services. ORCID allows for identity federation account linking, e.g. to use Shibboleth federations. This requires the respective institution to allow ORCID as a Service Provider. ORCID itself can already act as an Attribute Provider based on OAuth2. To use this in services provided by the institution application integration is necessary. This will be discussed and evaluated for the envisioned BioDATEN services.

ORCID is already “GDPR certified”. This certification is the result of a long process initiated by the German chapter. This and its openness and independence make ORCID a prime candidate for long term person identification for BioDATEN research data sets. There are some precautious considerations: Most users are self-registered and may not be using their institution's email. There is no standardized endorsement process as an ID can be linked to institutional frameworks, but this is not required. An institutional account is difficult to distinguish from private ones. The organisation and institutional IDs are still a challenge to be unified globally. Finally, ORCID users should be aware that using ORCID has some implications: Using the ORCID API requires customising of software components. A persistent lifelong identifier allows tracking users for ever. ORCID is a US based, not-for-profit organisation, which is as such not bound to the Safe Harbour treaty. Nevertheless, it is currently the most open and viable solution available and it seems as if ORCID will become a worldwide standard. BioDATEN will strive for a coordinated solution within the State of Baden-Württemberg and opt for ORCID as the most wide spread and versatile person identifier. This could imply to add ORCID to IDM systems, provide coordinated endorsements and exchange ORCID via bwIDM federation (as an attribute).

Sponsored by: