
The COMET Model
Transitioning to Community-Curated PID Metadata Enrichment
By Adam Buttrick, John Chodacki, Juan Pablo Alperin, Maria Gould, Nees Jan van Eck, Dione Mentis, and Clare Dean
Introduction
Research infrastructure depends fundamentally on persistent identifiers (PIDs) and their associated metadata. PIDs—such as digital object identifiers (DOIs) for publications and data, ORCIDs for researchers, and ROR IDs for institutions—serve as the connective tissue of scholarly communication, enabling the discovery, attribution, and evaluation of research outputs. Their fitness for these tasks is determined, almost exclusively, by the quality and completeness of their metadata. When its quality is high, it contains crucial information such as who conducted the research, where it was performed, how it was funded, and how different outputs relate to each other. This metadata transforms a simple identifier into a rich description that powers everything from literature searches to impact assessments.
Within this context, the PID ecosystem faces significant challenges to its improvement, stemming from an ownership model where original depositors serve as the sole, authoritative source for their metadata. While this single-source approach provides important consistency and accountability, it also means that even when a metadata record contains gaps or errors, the broader community is unable to address these deficits. This is true despite the fact that many others (researchers, institutions, funders, platforms and service providers) often possess the knowledge and ability needed to make improvements.
This gap between the opportunity and the ability to contribute presents an opening. By creating mechanisms for these stakeholders to share in the stewardship metadata, the ecosystem can transform itself from one of individual burdens to a system that takes full advantage of the expertise and capacity of the research community. Collaborative Metadata (COMET) has brought together stakeholders to seize this opportunity.
Background
In mid-2024, groups from across the scholarly community resurfaced the need for collaborative approaches to increasing metadata quality, beginning with stakeholder consultations at the FORCE11 conference in Los Angeles and the Paris Conference on Open Research Information. These discussions aimed to solidify their understanding of shared challenges, so that they could explore new solutions to collectively enrich, verify, and maintain the accuracy and trustworthiness of research metadata. We also explored how to leverage learnings from successful initiatives such as Research Organisation Registry (ROR), which created a trusted, globally adopted registry for institutions through a community-curated metadata model.
From these workshops, a more focused effort to progress these discussions was established by the COMET Taskforce—including a broader coalition from the community. Over 80 representatives from across the research community joined together over the course of several months to explore the product development, technical, and governance aspects of what became known as the COMET Model. Using feedback from these sessions, the Taskforce conducted a Community Call to Action to identify individuals and organizations who could provide resources and advisory support. Currently, COMET is in a place where it can demonstrate the process and models that the Taskforce has outlined. To that end, several pilot projects were launched in May, 2025 as a way to explore and iterate on the COMET Model.
The COMET Model: Foundational Principles
The COMET Model is rooted in the understanding that metadata exists in a complex ecosystem with multiple stakeholders, modes of production, and incentives that determine quality and completeness.
From Individual to Collective Stewardship
Much of scholarly metadata exists in PID ecosystems where original depositors establish the authoritative record of the metadata, while numerous other stakeholders depend equally on its quality and completeness. However, these same stakeholders lack structured mechanisms to contribute improvements when gaps exist, blocked by concerns that allowing external contributions would compromise source authority (i.e., the legitimacy of the original source).
While the need to preserve source authority is fully legitimate and should be upheld, assuming this also ensures high quality and complete metadata, it does not reflect the realities of how such metadata are produced. Nor should this need supersede (vs. be balanced against) all other concerns. Research outputs are the collective product of numerous actors, yet the production of this metadata rarely reflects that reality. The COMET Model addresses this by recognizing that source authority is distinct from metadata quality.
As such, it is necessary to evolve from treating the stewardship of PID metadata as an individual burden to be borne by its creators to a model that recenters this stewardship by embracing and empowering the community to contribute to its improvement. This approach recognizes that research is a collective endeavor, whose description likewise entails shared effort, allowing all participants to contribute their expertise and authority to the production of a complete and accurate scholarly record.
Figure 1: A collective metadata stewardship model
From Siloed to Collective Benefit
Currently, when stakeholders encounter metadata quality issues, they develop isolated correction mechanisms outside the PID ecosystem. Universities maintain internal databases with corrected affiliations, funders track grant connections separately, and service providers build proprietary enrichment systems. These parallel efforts address immediate needs but operate in isolation from authoritative sources, creating fragmented and often conflicting versions of metadata across platforms.
To transition to a model of collective benefit, COMET proposes a new infrastructure that systematizes PID metadata enrichment. It is conceived as a centralized data store that acts as a meeting place for metadata improvements that the community contributes as structured, open assertions with clear and comprehensive provenance, capturing how each contribution was sourced, developed, and validated. The system can surface, compare, and reconcile competing assertions about PID metadata in a scalable, organized way. Using this framework, separate and isolated improvement efforts can be brought together into a unified, interconnected, and machine-actionable resource.
The infrastructure is not meant to replace the role of PID providers. Instead, it is designed as complementary to those systems, providing an additional layer that enables better coordination and sharing. This creates a reliable and authoritative foundation for producing higher-quality metadata that can be contributed back to the source, benefitting all stakeholders.
Figure 2: A unified metadata enrichments workflow
Trust through Transparency and Collective Governance
Traditional PID metadata workflows derive trust primarily from the identity of the record creator—who registered the metadata. COMET fundamentally shifts this paradigm by rooting trust in process and transparency. It is not meant to replace traditional workflows or detract from the need for those approaches, but instead is meant to be complementary to them as additive value.
Transparent Provenance Systems
To ensure this is the case, every enrichment contribution must include comprehensive documentation of its source, methodology, and validation process. This creates an auditable trail that enables users to assess the reliability and appropriateness of specific improvements.
Shared Evaluative Frameworks
Throughout the ecosystem, institutions and groups already develop and maintain criteria for assessing metadata quality and validating enrichments. Instead of maintaining these criteria in silos, in the COMET Model, these frameworks are publicly documented, consistently applied, and continuously refined based on community feedback and practical experience.
Collective Governance
Trust, transparency, and continuous improvement are central to this model. The goal of this work is to create a self-reinforcing cycle of quality improvement, transforming metadata stewardship into a dynamic, inclusive and collaborative process. In order to maintain this balance, the COMET initiative operates through a tiered governance structure that balances operational efficiency with broad community participation:
Organizers: Core members with day-to-day responsibilities and operational members from partner organizations who contribute dedicated resources and participate in resource allocation decisions
Advisors: Advisory group of stakeholders with key expertise who provide advice on strategic direction, governance, and project development
Community: A large, open group of stakeholders who support the initiative and may participate in activities at various levels.
The COMET Model: Elements in Demonstration
The COMET Model emerged through our community consultation process. Building on the listening sessions from the taskforce phase, the demonstration phase will pilot real-world test cases to explore different approaches to metadata enrichment. This allows us to evaluate the effectiveness, accuracy, cost, and scalability of these methods and determine the resources required to support them with real-world scenarios and actors.
Below is a list of the main elements of the COMET Model and pilot projects we are conducted to test and learn from:
Fields as Features
We can borrow from software development processes by considering each metadata field as a discrete feature. This framework allows us to make progress by resolving issues one field at a time, similar to how new software features are developed and released incrementally. Using this approach provides several advantages: it allows for focused problem-solving, enables measurable progress on specific issues, reduces complexity in both technical implementation and community coordination, and creates opportunities for early wins that build confidence in the model.
For example, incomplete affiliation metadata in DOI records creates a cascade of problems across the research ecosystem. Researchers are impeded in finding relevant work from specific institutions or regions. Organizations struggle to demonstrate their impact and identify collaborators. Funders and policymakers lose the ability to accurately track the results of their investments and the outcomes they produce. Most critically, these gaps disproportionately impact underrepresented research communities, creating yet another barrier to equity and inclusion in global scholarship.
Content Type Variety
A complementary stance to our "fields as features" approach is making sure that we attend to a variety of scholarly content types (e.g. journal article, preprints, data, software, projects, theses, dissertations, grants). Each scholarly content type presents unique challenges to metadata quality and completeness, as well as encompasses the needs of different stakeholders. By addressing multiple content types simultaneously, we can develop flexible solutions that can be applied generally, as opposed to narrow improvements that only benefit specific communities or research outputs. The demonstration phase includes pilots that tackle metadata improvements in relation to open access journal articles, preprints, electronic thesis dissertations; and the classification of resource types in DOI metadata from generic categories into more meaningful, representative controlled vocabularies.
Stakeholder Variety
Collaborations and enrichments must build a network of stakeholders invested in the long-term success and adoption of the infrastructure. Similarly, prioritizing stakeholder variety ensures that we are attentive to the needs and perspectives of diverse organizations and regions (e.g. diverse languages, geographic locations, organization sizes, under-resourced communities, under-resourced disciplines) .
Since the start of the demonstration phase we have engaged with numerous stakeholders to help us define the scope of the pilots and provide source content and enriched data samples, including: representatives of domain repositories such as arXiv; infrastructure providers such as OpenAlex; and research organizations such as EMBL-EBI, Wageningen University & Research (WUR), and CoLaV based in Colombia. Engaging with a variety of stakeholders is an ongoing process and we encourage the future community members to share their enrichment use cases with us.
De-siloing Quality Metadata Sources
A central element of the COMET Model involves bringing together the wealth of metadata enrichment work that already occurs across disconnected systems. Many organizations maintain sophisticated metadata curation processes. However, these efforts operate in isolation, creating duplicated work and preventing the broader community from benefiting from these investments.
For example, universities and research institutions worldwide invest substantial resources in maintaining comprehensive, accurate records of their researchers' outputs through Current Research Information Systems (CRIS). These systems contain meticulously curated information about institutional affiliations, research collaborations, and author details that is often far more accurate and complete than what appears in published metadata. However, this valuable institutional knowledge remains largely isolated within individual systems, creating a fundamental disconnect between the careful curation work institutions perform and the metadata quality available to the broader research community.
The demonstration phase focuses on this prime example of quality siloed metadata to understand research institutions’ diverse approaches to metadata curation, the challenges they encounter, and the possible pathways to leverage this valuable effort.
Reconciliation and Integration Pathways
Currently, DOIs operate within a ‘push’ model, deposited by a creator (with a DOI registration agency such as Crossref or DataCite) who is responsible for its upkeep. Maintaining this DOI metadata is frequently beyond the capacity and resources of these creators. The pilot projects investigate both direct and indirect approaches to this reconciliation challenge:
Direct integration: Some projects work directly with DOI registration agencies like DataCite and Crossref to explore how community-contributed enrichments can be validated and incorporated into official metadata records. This approach offers the most direct path to improving the authoritative metadata that all stakeholders depend upon.
Platform-mediated integration: Other projects investigate indirect pathways where enrichments flow back to authoritative sources through the platforms and services that originally deposited the metadata. For example, improvements developed through COMET might be integrated into repository systems like those powered by PKP's Open Journal Systems, which then propagate to Crossref during the regular metadata update process.
Uniting New and Existing Enrichment Solutions
The COMET initiative does not seek to own metadata enrichment space. Rather, it positions itself as a collaborative facilitator, instigating new workflows whilst amplifying and connecting existing enrichment solutions throughout the community. This approach recognizes that achieving metadata quality at scale requires leveraging the full spectrum of community expertise and existing investments.
The demonstration phase includes pilots that explore:
Developing new approaches: Some projects focus on creating novel technical capabilities, such as using machine learning to improve how research outputs are classified or developing new methods for extracting affiliation information from complex document formats. These efforts fill gaps where existing solutions are insufficient or unavailable.
Building on existing work: Other projects deliberately build upon proven approaches, such as established matching strategies for linking related research outputs or extending successful affiliation extraction methods to new content types. This approach maximizes the value of prior community investments whilst addressing broader application needs.
Integrating community solutions: Several pilots investigate how COMET infrastructure can integrate with existing services and platforms. Rather than competing with these solutions, COMET seeks to enhance their impact by providing standardized pathways for sharing enrichments across institutional boundaries.
As you can see, these pilot projects are designed to further explore the core elements of the COMET Model. They help us further identify the requirements for community interaction, establish workflows for data sharing, create evaluative frameworks, and ensure that future infrastructure is responsive to the needs of its diverse stakeholders.
Tackling these real-world tasks also surfaces the complexities of handling various forms of assertions and how to model all enrichments with useful, actionable provenance. The tangible improvements to the scholarly record they provide likewise showcase the benefits of the COMET Model to researchers, institutions, funders, and the broader community, encouraging wider participation and support.
The Demonstration Phase aims to provide concrete evidence that this vision is both technically feasible and practically beneficial, creating a foundation for a sustainable, community-driven metadata infrastructure that the research ecosystem requires.
Summary of the COMET Model
The COMET Model aims to address the fundamental disconnect between the collaborative nature of research and the individual burden of metadata stewardship. It provides a systematic approach to transforming how the scholarly community creates, maintains, and improves the metadata that powers research discovery and evaluation.
Foundational Principles
Collective stewardship: Recenters the responsibility for PID metadata from the individual depositor to a partnership distributed across the community. It is a model that respects the original DOI creator and allows all stakeholders to contribute to metadata enrichment.
Collective benefit: De-silos valuable curations from the community into a centralized enrichment store that enables enrichments to be reconciled with and incorporated into DOI source metadata for the benefit of all downstream stakeholders.
Trust through transparency and collective governance: Builds community confidence through open infrastructure, clear provenance tracking, and shared governance structures that give stakeholders meaningful voice in both operational decisions and strategic direction.
Elements of the COMET Model
Fields as features: Enables incremental progress by addressing individual metadata fields systematically
Content type variety: Ensures broad applicability by focusing on diverse research outputs.
Stakeholder variety: Engages diverse perspectives to ensure solutions address real community needs.
De-siloing quality metadata sources: Brings together high-quality metadata from diverse organizations
Reconciliation and integration pathways: Creates both direct and platform-mediated mechanisms for community-contributed improvements to be integrated back into authoritative metadata sources.
Uniting new and existing enrichment solutions: Positions COMET as a collaborative facilitator that develops new capabilities whilst amplifying and connecting existing community solutions.
Download a PDF of this article on Zenodo.