Participant Perspectives | Bhavesh Patel, Research Professor

 

Bhavesh Patel, Research Professor, FAIR Data Innovations Hub, California Medical Innovations Institute

 

DOI 10.7269/C1BC72 

Please explain a little about your background and why you’re interested in persistent identifier (PID) metadata and its enrichment.

My academic background is in computational mechanics, and I have been applying it to the modeling and simulation of organs and tissues for biomedical research for almost a decade. After being exposed to the FAIR Data Principles in 2018, I gained more interest in FAIR practices and how to make them more accessible to researchers. I started the FAIR Data Innovations Hub division in 2019 where my team and I develop standards, guidelines, and open source computer tools that make it easier for biomedical researchers to make their data, software, and other research outcomes FAIR. 

PID and associated metadata constitute a core element of the “Findable” aspect of FAIR practices. I am interested in PID metadata and its enrichment because associating rich, accurate, and up-to-date metadata with PIDs is critical for reproducibility and discoverability. In biomedical research, linking research outcomes, for instance, publications to supporting data, software, and grants, is essential for building upon prior research and accelerating the pace of discoveries. 

What excites you most about the potential for collaborative enrichment of PID metadata (e.g. to improve research discoverability, impact tracking, better reflect the global nature of scholarly communications)? What do you think will be the most challenging aspect to address?

The PID metadata of a resource is often limited either because the researchers sharing the resource are not providing richer metadata or the infrastructure registering the PID for the resource is not including all of it. Moreover, PID metadata is often viewed as static. As a result if a dataset is shared first it will not be linked to other resources developed using it such as a computational model or resulting paper. This means that anyone finding the dataset will not necessarily know that there is an associated code or manuscript which could make reuse of the dataset more challenging. 

We recently encountered these challenges when evaluating the FAIRness of optical coherence tomography (OCT) datasets for age-related macular degeneration (AMD). First, we struggled to find relevant datasets we were looking for, mainly because they lacked metadata that would have facilitated their discovery. Then, many of the datasets we found had very poor descriptions of the data collection process and provenance. To properly understand the datasets, we had to conduct an extensive, time-consuming literature review to find manuscripts where those datasets are described. This could have been avoided for instance if the manuscripts were linked back to their respective dataset’s metadata.

What excites me most about collaborative enrichment of PID is the potential to address all these shortcomings and create a more interconnected and discoverable research ecosystem. Additionally, this would also allow us to better track the reuse of resources and properly reward the researchers for sharing them. 

The most challenging aspect, in my opinion, will be monitoring the validity of the suggested enrichment, such as verifying the accuracy of linked resources or ensuring proper attribution, to prevent misuse of the opportunities provided by community enrichment. On the other hand, truly democratizing community enrichment and enabling involvement from all stakeholders, including those not affiliated with a large institution or having specialized skills, will also be a challenging aspect to address.

What successful examples of community collaboration in scholarly infrastructure have you witnessed that could inform the proposed COMET model’s development?

I have been fortunate to be involved in the NIH SPARC Program since 2017, where there is a great emphasis on FAIR data sharing. The program supported the development of the SPARC Data Portal and has been highly successful in enabling researchers to share data on the portal through two main aspects: 1) A user support tool called SODA (that we are developing it!) that makes it easy for any researchers to organize and share their data according to the SPARC Data Standards and 2) A team of human curators that assist researchers and review their data before it is published on the SPARC Data Portal. I believe COMET could learn from the SPARC program that for the community PID metadata enrichment to be successful and accessible to all, there will likely be a need for a tool that makes it easier for anyone to participate in the enrichment without requiring particular skills and there will be a need for human curator to intervene in the process to ensure best practices are followed.

Figure 1.  A screenshot of the homepage of SODA.

How could better and more complete PID metadata, derived from the proposed COMET model, help to advance your goals, those of your organization, or your communities?

At the FAIR Data Innovations Hub, our goal is to simplify the sharing of FAIR data, software, and other research outputs for biomedical researchers. However, it still requires additional time and effort from the researchers to share their outcomes and this adds another burden on top of all the work they already have. Biomedical researchers are often promised recognition for their sharing effort, for instance through citations, mentions, and increased collaboration. Yet, in practice, research incentives remain heavily publication-focused, and datasets, software, and other outputs are mostly discovered through the papers that reference them. I believe PID metadata enrichments are essential to fulfill the promises made to researchers and ensure they receive the credit and visibility they deserve for all their contributions.

What benefits do you envision enriched PID metadata enrichments, such as is being aimed for through COMET, will have on the broader research ecosystem?

It is clear that enriched PID metadata will similarly benefit all fields of research with greater transparency and discoverability. Additionally, I can envision that rich, accurate, up-to-date PID metadata can enable researchers from different disciplines to more easily discover and integrate relevant information across fields. Moreover, this can also help provide accurate and comprehensive metrics for assessing the impact of a researcher or a research project, which in return can encourage greater sharing of resource outcomes beyond just manuscripts.

Why do you think organizations interested in PID metadata enrichment should consider contributing resources to fund the first phase of development for the proposed COMET model?

All stakeholders in scientific research will greatly benefit from rich, accurate, and up-to-date PID metadata. Researchers and their institutions could benefit through better discoverability of existing resources that can prevent duplication of efforts and similarly increase the reuse potential of their own shared resources. Funders can benefit by increasing their return on investment if the resources they fund end up being reused beyond the originally intended scope. Scientific publishers and repository maintainers will enjoy greater use of their infrastructure. All stakeholders should, therefore, consider contributing resources to fund the first phase of development for the proposed COMET model.



Next
Next

Participant Perspectives | Cameron Neylon, Curtin Open Knowledge Initiative