Emerging Standards: Data and Data Exchange in Scholarly Publishing

Creating industry standards for manuscript data and capturing this information are major challenges for scientific publications. The speakers at this session discussed creating universal identifiers, standardizing contributions, accurately reporting funding sources, and giving authors the opportunity to include these data during the submission process.

Jay Henry of Ringgold began the session with an analogy of a forest representing data. Without clearly identifying unique entities within the forest (trees), it’s difficult to see the big picture in a seemingly never-ending collection of trees. Henry believes the first step in deriving useful knowledge about the forest is to apply unique identifiers to the trees (people, places, and things). For identifiers to be effective, they must be “governed, trusted, transparent, and contain appropriate metadata.” Standard identifiers “disambiguate and enable linking, in other words, they provide a simple basis for data governance.” Henry then discussed the International Standard Name Identifier (ISNI), which provides a unique identifier for named entities (people and places), and Ringgold’s progress in mapping Ringgold data to ISNI and acting as an ISNI registration agency for ISNI members. At the end of the presentation, some of the challenges to creating a world of identifiable information were mentioned: “vastness, vagueness, uncertainty, inconsistency, and deceit.”

Amy Brand addressed the importance of giving appropriate credit to scientists. She showed a graph from R-bloggers (i2.wp.com/ benjaminlmoore.files.wordpress.com/2014/04/ plot_lm.png) that tracked different Public Library of Science (PLoS) publications and the average number of authors included on an author line. The number has increased steadily from 2006 to 2013. Later in her presentation, Brand discussed developing a standard contributor role taxonomy for publishers that would allow contributions to be more easily converted into metadata. Brand is part of the Project CRediT working group trying to develop 14 umbrella contributions that could apply to all fields of research. These contributions include: “conceptualization, methodology, software, validation, formal analysis, investigation, resources, data curation, writing-original draft, writing-review and editing, visualization, supervision, project administration, and funding acquisition.” Publication service providers are starting to integrate these contributions into their systems and would automatically deposit these data into CrossRef, which would send them to ORCID, where they would eventually appear in a contributions report. All of this work is being done to make sure that contributors to multiauthored works are more fairly credited for their contributions.

Rachel Lammey from CrossRef was next to present and discussed FundRef, which is “a standard way of reporting funding sources for published scholarly research.” In February 2013, the Office of Science and Technology Policy published a memo requiring agencies with more than $100 million in research and development expenditures to develop plans to make federally funded research freely available to the public within one year of publication. This adds a level of immediacy to the process of standardizing funding sources. To help bring together funders and publishers, FundRef was created. FundRef allows authors to input standardized information and grant numbers at submission, so that if the paper is accepted, this information is made available and published correctly. Currently, “FundRef is the only central database of acknowledgments from publications.” To date, 41 publishers, 562,000 DOI deposits, and 9,522 funders are in the registry.

Helen Atkins from PLoS was the final presenter and discussed the input end of capturing the data. PLoS uses the Editorial Manager submission system and has made updates so that most of the data previously mentioned can be entered directly by authors or chosen from a pick-list. Currently, although the information is exported to the XML vendor, tagging and providing it to others and including the journal-hosting platform are the major challenges. FundRef, Ringgold data, author contributions, and author identification fields have all been added to their system. PLoS will need to upgrade to the Journal Article Tag Suite before all of the information can be tagged. Each of these presenters discussed a different perspective regarding data in today’s scientific journals. From universal identifiers to standardized contributions that will more fairly describe work completed to systems that serve as depositories for funding information and finally the submission system that will serve as the access point for authors to enter their information, attendees of this session were educated on data issues that we as publishing professionals will be working through for years to come.