Standardizing Data and Data Exchange in Scholarly Publishing

Jay Henry introduced Ringgold and its mission. In short, Ringgold aims to help to connect data among stakeholders in scholarly communications. In 2005, Ringgold was first established as a consultancy, and it grew to address the problem of having many identifiers for the same institution. The goal was to build an authoritative database of uniquely identified institutions. For that to happen, multiple data-cleaning steps had to be taken. Ringgold has 400,000 records in its database and focuses on institutions but is moving into creating metadata on content as well.

Henry discussed some of the benefits of Ringgold institutional identifiers. Institutional identifiers differentiate between institutions that have similar names and between different abbreviations or names that are used for the same institution. That helps the supply chain by allowing a better understanding of a potential relationship with any particular institution. For example, it can be difficult to know whether an institution that one is approaching is already a customer. In addition to having an identifying role, the identifiers keep track of the hierarchy of departments in universities and other institutions. Ultimately, the goal for Ringgold is to help participants in the supply chain to use data more effectively.

Open Researcher and Contributor ID (ORCID), as explained by Rebecca Bryant, is nonprofit, nonproprietary, open, and community driven. Its mission is to identify authors and contributors uniquely. Two authors may have the same name, but they can be differentiated by their ORCID IDs, and their work can be attributed correctly. That not only helps to identify individual authors and differentiate between identical names but helps in tracking an author as he or she moves between countries and institutions. ORCID has issued more than 685,000 IDs since its international launch in 2012.

ORCID is able to offer its service to authors without charge because it is supported by member organizations that use the ORCID application-programmer interface for the exchange of information. Researchers set up free identifiers for themselves while controlling the privacy of their records.

Once created, an ORCID ID becomes embedded in the metadata of an author’s work, such as grants, manuscripts, and society memberships. This allows effective exchange of information among communities, such as repositories, societies, and funding bodies. ORCID tries to encourage early adoption to help young researchers with their career management.

Funding organizations are now requesting ORCID IDs. Funders have the potential to capture ORCID information to improve the grant-submission process. It is important that the National Institutes of Health, the largest funding body in the United States, has integrated ORCID, and it will be followed by the National Science Foundation later this year.

The ORCID record is comprehensive: it can be linked to a Scopus ID, and data on grants from more than 60 funding organizations can be linked as part of the record. A list of publisher members who have adopted ORCID IDs is available from the ORCID Web site. ORCID is also working on establishing ways to reward reviewer work by including such work in the ORCID ID. Progress is expected by the middle of summer.

Elizabeth Blake introduced various topics surrounding the Journal Article Tag Suite (JATS), the latest version of the National Library of Medicine (NLM) document type definition (DTD). The tag suite is now managed by the National Information Standards Organization (NISO) rather than by NLM. Thus, new versions go through a formal standards process, and anyone who wants to can be involved in the process. You can go to nisohq@niso.org to indicate that you want to be involved.

JATS 1.0 was released in August 2012 and includes some improvements over the previous version. It now supports a contributor ID, such as an ORCID ID. It also supports multiscript author names, such as traditional and romanized Japanese names.

JATS 1.1d1 (draft 1 of version 1.1) was released in November 2013. Version 1.1 is due in the second half of 2014. This version offers MathML 3 support, institutional IDs, and a new code element (with greater support than previously available for computer code). Expect updates of JATS about once a year.

Institutional IDs are also supported, such as the International Standard Name Identifier and Ringgold IDs. Those IDs are used in affiliation and funding information.

If you use NLM DTD 3.0 and want to upgrade, JATS is backward compatible. However, JATS is not backward compatible with NLM DTD 2.3.

When making changes in your metadata workflow, be sure to build some quality control in. Ideally, standards need to work together to facilitate the workflow process. The NLM DTD started as a markup solution; the combination of JATS and the Book Interchange Tag Suite now provides a workflow foundation.

In summary, those and other standards facilitate transmission of metadata from submission to production and distribution.

Carol Ann Meyer presented CrossRef’s mission to improve scholarly communication. CrossRef enables linking, discovery, evaluation, and connection of scholarly publications. It helps in the evaluation of scholarly content (regarding, for example, updates and plagarism). And it serves as a hub, allows scholarly publication metadata to be used in ways that were never envisioned before, and increases the possibilities for collaboration.

CrossRef has two offfices (in Oxford, United Kingdom, and Lynnfield, Massachusetts) and only 25 employees. One must be a publisher to be a member. Most of the 1950 members are micropublishers. Only six publisher members have more than $500 million in revenue.

CrossRef is best known for assigning digital object identifiers (DOIs) to journals, books and book chapters, conference papers, reports, dissertations, data sets, figures, and tables. It encourages authors to cite their data with CrossRef or DataCite DOIs from the bibliographic sections of their publications; a DOI can be assigned to supplementary data, and data become citable in articles that have a CrossRef DOI.

CrossRef will support the NISO openaccess metadata and indicators. The NISO recommendations also provide a standard way to indicate embargo periods.

The metadata required for CrossRef’s FundRef funder-information service—a standard way to report funder information— include funder_name, funder_identifier, and award_number. Consult www.crossref.org/ fundref for more information.

It is difficult for a computer to parse funding information from articles. Funding-data and conflict-of-interest statements are often combined, and funding information is written in prose and so is not easy to parse. Why does that matter? Funding bodies cannot track what happens after a grant is awarded. Do publications arise from it? Publishers cannot report which articles result from specific funders or grants. And institutions cannot link funding received to published output. Lack of standardization makes it difficult to analyze or mine textual funding statements.

Even if funding agencies could be easily identified in text, authors and publishers use different names, different abbreviations, and different punctuation for reporting to the same funding bodies. The FundRef Registry is a controlled vocabulary of more than 6000 international funders that can be incorporated into the manuscript submission process or can be applied to previously published articles. Once a standardized version of a funder name is associated with an article, funders, publishers, institutions, authors, and the public can use CrossRef search interfaces or third-party tools, such as those being built by CHORUS and SHARE through CrossRef’s application-programming interfaces, to discover which publications are funded by which funders.