Participant Perspectives |  European Molecular Biology Laboratory (EMBL)

 
 

DOI 10.7269/C16P4Q

Perspective by EMBL-EBI

At European Molecular Biology Laboratory (EMBL), we see a clear alignment between our metadata initiatives and the goals of COMET (the Collaborative Metadata Enrichment Taskforce). COMET calls on institutions, repositories, and service providers to work together in enriching metadata in ways that are open, trusted, and scalable. Our internal initiative, ENRICH (Enhancing Open Infrastructure with Research Information and Curated High-Quality Metadata), is directly aligned to this call. What follows is a deeper look into the motivations behind our work and how EMBL’s effort supports and extends the COMET model. 

By bringing together EMBL’s Office for Scientific Information Management (OSIM) and Europe PMC, EMBL’s internal ENRICH initiative is a practical demonstration of how institutions and infrastructures can collaborate to expose, enrich, and reuse accurate metadata in the open. How this collaboration works:  

●      OSIM is an internal EMBL team that supports researchers in embedding open science practices into their work. OSIM monitors EMBL’s adoption of open science and reports on research outputs, such as publications, datasets, and software.

●      Europe PMC, a literature database hosted at EMBL’s European Bioinformatics Institute (EBI), enriches publications with linked data, reviews, preprints, grant information, and other key resources. As a long-standing leader in surfacing research outputs through text mining, Europe PMC enhances the visibility and discoverability of critical research components. Its close integration with EMBL-EBI’s data resources creates further opportunities to build an interconnected ecosystem of research outputs. EMBL also uses Europe PMC as its institutional repository, where persistent identifiers (PIDs) – including ORCID ids and ROR ids – alongside high-quality metadata, play a vital role in accurately managing and tracking EMBL’s scholarly output.

This collaboration between the institution (EMBL’s OSIM) and the infrastructure provider (Europe PMC) has revealed key areas of opportunity:

●      PIDs as tools to enhance open science monitoring

●      Improving interoperability between CRIS systems and Europe PMC

●      Make human-curated metadata accessible

ENRICH aims to advance the integration of highly accurate, librarian-curated, data from EMBL’s Current Research Information System (CRIS) into Europe PMC and ORCID. This initiative will help establish Europe PMC as a trusted, openly accessible source of EMBL’s research output metadata.

Why does a research organization like EMBL need this?

ENRICH is EMBL’s response to the broader opportunity outlined by COMET: that institutions can and should take a more active role in enriching the metadata that describes their research. Our experience shows how institutional expertise and stewardship can directly contribute to the open infrastructure ecosystem COMET envisions.

As with other organizations in our community, the need for our ENRICH initiative emerged from OSIM’s experience managing several disconnected workflows for publication reporting and open science monitoring. OSIM currently uses a commercial CRIS system that sources publications from Europe PMC. We receive additional metadata, such as associated preprints, datasets, and software, directly from EMBL authors when they publish their papers. The EMBL publication list is generated, then analyzed using a machine learning algorithm to identify collaborator affiliations and extract additional metadata through open scholarly APIs, including Open Policy Finder and Unpaywall. However, the final analyzed data are stored exclusively in cloud-based systems. This results in metadata duplication and variability across systems (open and closed), and we still lack a single source of truth. Information not duplicated in Europe PMC results in reduced discoverability by the public and limited potential for reuse.

Discussions with other research organizations have reinforced the importance of interoperability and information flow between institutional CRIS systems and repositories. Commercial CRIS systems are often licensed together with a database from the same provider and can present a significant financial cost. We chose Europe PMC, an open infrastructure, as our institutional repository, therefore, we need to overcome the interoperability issues with our CRIS system.

Proof-of-Principle trial to enrich publications with curated ROR IDs

Collecting a comprehensive and accurate list of EMBL publications and verifying EMBL’s contribution has long been challenging. Simple affiliation-based searches often produce false positives in most databases. As a result, EMBL librarians maintain a trusted publication list, curated with a high degree of accuracy to reflect the actual contributions of EMBL and its researchers. However, this curated metadata resides only within the CRIS system and spreadsheets.

To make this accurate publication list publicly accessible through Europe PMC, we initiated a pilot project to transfer curated EMBL ROR IDs from the CRIS system into Europe PMC. This pilot reflects a key COMET principle: metadata enrichment is most effective when trusted, institutionally curated information is made openly available through shared infrastructure. By contributing accurate affiliation data using persistent identifiers like ROR, EMBL is helping demonstrate how institutions can improve metadata quality at scale in ways that benefit the broader ecosystem. Our process involved evaluating existing ROR ID sources within Europe PMC, generating structured exports of the curated data, and developing a semi-automated upload mechanism. This pilot enabled us to reliably retrieve EMBL publications using ROR IDs within Europe PMC (EMBL-Heidelberg, EMBL-Hamburg, EMBL-EBI, EMBL-Rome, EMBL-Grenoble, EMBL-Barcelona) and accurately display curated records. This has enriched the publicly available information via Europe PMC’s open APIs. It demonstrates the potential to expand metadata coverage to include ORCID IDs, linked preprints, datasets, data management plans, software, and other research outputs, enhancing the visibility of EMBL’s open science contributions and providing open metadata for reuse by others.

What impact would enriched PID metadata have on the research ecosystem?

COMET has the potential to reshape the governance of research metadata and empower a broader range of stakeholders across the research community to contribute to its enrichment. This will improve the trustworthiness of metadata and ensure it is hosted on more sustainable and resilient infrastructures. Expanding the range of metadata available for research outputs, beyond traditional publications, is essential for advancing open science. It will enhance our ability to monitor open science practices and enable broader, more responsible research assessment initiatives such as DORA and CoARA.

Initiatives like ENRICH illustrate how institutional contributions can operationalize COMET’s vision, bridging internal curation practices with community-owned infrastructure. EMBL’s experience shows what it looks like when the ideals of COMET are put into practice. Progress together will significantly reduce the time and effort required of EMBL research information specialists and librarians, allowing us to support researchers more efficiently.  

Why should institutional stakeholders consider contributing to COMET?

Research organizations facing similar challenges to EMBL in making their metadata discoverable and reliably stored on open infrastructure should strongly consider engaging with COMET. As we continue to evaluate our research information sources and adapt to evolving needs around open science monitoring, exploring open, community-driven, and sustainable models like COMET is increasingly valuable. Many institutions already curate high-quality metadata that could be shared in the broader ecosystem, improving accuracy, discoverability, and usability.

Next
Next

Participant Perspectives | Bhavesh Patel, Research Professor