Crossref's Perspective on Open Metadata Enrichment

Nov 7

As one of the largest open scholarly infrastructures and metadata registries, Crossref offers a unique perspective on collaborative metadata enrichment and how initiatives like COMET align with its development roadmap, driven by the goals of its diverse community—by Ginny Hendricks, Chief Program Officer at Crossref.

Why Metadata Matters

At Crossref, we've been stewarding metadata for 25 years, growing to 175 million records from 23,500 organisations across 163 countries. This scale gives us a front-row seat to both the incredible potential and the ongoing challenges of metadata enrichment at scale.

We believe that metadata is communication; it can tell a story about research and paint a picture for others to respond to and learn from, across the world and throughout the forthcoming generations. Our work is guided by the Research Nexus vision—a rich, reusable open network connecting research organisations, people, things, and actions. This goes beyond persistent identifiers to focus on relationships and context within the research ecosystem.

We try to strike a balance between persistence and accessibility by maintaining low membership barriers while encouraging best metadata practices. We conduct multiple global and collaborative engagement activities, such as frequent online 'metadata health checks' using Participation Reports, as well as supporting multiple open metadata initiatives over the years. This approach enables broad participation but creates the metadata gaps that we work hard to fill in and that initiatives like COMET also aim to address.

The Promise and Challenges of Collaborative Enrichment

We observe collaborative enrichment happening in real-time within our system, as Crossref metadata is updated several times more frequently than new records are added, largely due to community assertions and feedback. This demonstrates the research community's commitment to improving discoverability and completeness of the scholarly record.

Over the years, we've also increased the ways in which we ourselves enrich the metadata in our system, for example, by inserting Open Funder Registry IDs based on often very messy strings, or by adding reciprocal relationships so that when one member asserts “finances” in grant metadata, we add “isFinancedBy” to the accompanying publication metadata. This also applies to relationships like preprints and articles, versions, peer reviews, and data citation. So we see the enrichment of metadata occurring frequently, both manually and programmatically, within the Crossref ecosystem.

Our experience in building community-enriched services has revealed both additional opportunities and multiple challenges. The Open Funder Registry has enabled publishers to standardise funding metadata through community curation, as has ROR for affiliations, while Crossmark allows publishers and readers to signal and discover post-publication changes to research. Because we established (with CDL and DataCite) a more collaborative curation and governance approach for ROR, this registry will soon usurp the Open Funder Registry as the primary funder identifier. More recently, our collaboration with Retraction Watch demonstrates how community-gathered data can supplement member-provided metadata to create a more complete picture of the integrity of the scholarly record.

The primary challenge is developing scalable, automated validation and quality control. Our current mechanisms are low-tech, relying on manual processes that are difficult to manage at high volumes. A second challenge is related to awareness, reach, adoption, and, often, even willingness. Another area is around provenance and assertion tracking.

Crossref policies stipulate that metadata stewards (members) should retain ultimate responsibility for metadata accuracy, while also enabling community contributions and our own system assertions. A balance between openness and accountability is crucial for collaborative metadata initiatives. Authentication, identity management, and validating third-party assertions represent considerable technical and governance challenges for collaborative enrichment projects, as do truly global communication and support structures.

This necessary combination of human and system involvement makes metadata enrichment a doubly challenging endeavour. How can we build and maintain participative systems that serve the needs of people doing or analysing the science? And in reverse, how can we help people understand, agree with, and engage with the systems we have designed for them?

Learning from Community Collaborations

The scholarly infrastructure landscape offers several successful models of community collaboration. Our 25-year journey demonstrates how member-driven governance fosters sustainable infrastructure when organisations commit to shared responsibilities.

The success of initiatives like ORCID and ROR demonstrates how openly governed systems and collaborative advocacy can achieve broad adoption by balancing community needs with technical reliability. Establishing the Metadata 20/20 initiative in 2018 enabled various stakeholders to articulate problem statements and share best practices that are still in use today. Crossref also supports the Barcelona Declaration on Open Research Information, which advocates for making openness of research information the norm and working with services that enable open research information. Event Data began as a collaborative initiative between Crossref and DataCite, based on work by PLOS, before the reality set in that sharing metadata between two entirely different systems is inherently complex, needs proper resourcing, and an overarching agreement on a strategic and technical approach. The two organisations’ versions diverged several years ago, and Crossref will be deprecating Event Data in favour of upgrading its existing metadata system and production API.

These examples suggest that successful collaboration in scholarly infrastructure—whether building systems or advocating for change—requires transparent governance, capacity, shared principles, good planning, and sustained engagement.

The Benefits of Better Metadata

Enhanced metadata directly supports our mission and the Research Nexus vision. Richer, more comprehensive metadata creates more value for Crossref members, 50% of whom are based in Asia, and 40% of whom self-identify as university-based. Our API is also one of the most heavily used in scholarly communications and beyond, with over 1.6 billion requests a month. They definitely want more complete metadata!

For our global community, enriched metadata means a more accurate representation of diverse, interconnected scholarly work spanning articles, books, grants, standards, datasets, protocols, dissertations, and more.

Enriched metadata provides a more comprehensive view of scholarship and its outcomes. It enhances discoverability across disciplines and regions, enables accurate and more transparent evaluation, assessment, impact tracking, and better reflects the global, collaborative nature of research.

Our Approach to Collaborative Development

While Crossref isn't directly involved in COMET yet—we're busy upgrading our systems to be fully open-source and more flexible to enable and adopt better metadata workflows—we recognise the value of community collaboration on shared challenges, and we are certainly interested in anything that improves the metadata and the systems we all rely on.

Our current roadmap addresses similar problems identified by COMET, including the development of matching services (with our methodologies and outputs open to everyone for reuse), the implementation of automated metadata quality feedback loops, and the integration of additional third-party data sources. Better metadata results directly from our matching services for references, funding, grants, preprints, and affiliations—work we're actively pursuing.

As different metadata initiatives emerge and learn from one another, they collectively improve the overall effort. While Crossref continues to rebuild its systems to enable further scalability of open metadata enrichment, we're following COMET's work as it addresses our shared challenges in building a more connected scholarly record. Multiple approaches to collaborative metadata enrichment—from technical infrastructure to community advocacy—benefit the entire ecosystem.

Dione Mentis

Crossref's Perspective on Open Metadata Enrichment

Why Metadata Matters

The Promise and Challenges of Collaborative Enrichment

Learning from Community Collaborations

The Benefits of Better Metadata

Our Approach to Collaborative Development

Collaborative Metadata

About

Join Us

Crossref's Perspective on Open Metadata Enrichment

Why Metadata Matters

The Promise and Challenges of Collaborative Enrichment

Learning from Community Collaborations

The Benefits of Better Metadata

Our Approach to Collaborative Development

Unlocking Author-Affiliation Metadata for All of arXiv

COMET Enrichment Projects: From Ideas to Action

Collaborative Metadata

About

Join Us