June 17, 2020

New to Edifix: Preprint Citation Parsing

Back in March 2020, with the number of preprints—and citations to those preprints—exploding because of the COVID-19 pandemic, we realized that Edifix had an urgent new challenge to meet: preprints.

We’re happy to announce that starting now, Edifix has been updated to handle preprint citations in your reference lists!

How it works

Edifix parses references to preprints, but does not restructure them. (This is similar to how Edifix handles conference proceedings.) 

Because guidelines for citing preprints are still maturing, it’s safer to leave judgements about which elements to include and how to format them in the hands of experienced human editors. Edifix also does not try to restructure preprint citations: we’ve found that the citation data submitted by authors varies widely, and we can’t always count on finding all the elements necessary to create a complete citation.

If you turn on Crossref Correction and DOI Linking before running your Edifix job, Edifix will also retrieve DOIs for preprints that have a DOI deposited with Crossref. If Crossref returns a preprint DOI, Edifix will insert it. If Crossref returns a DOI for the published version rather than the preprint—which happens quite frequently—Edifix will insert the DOI for the preprint, provided that the preprint metadata is correctly linked in Crossref. If the Crossref metadata indicates a final publication DOI for the preprint, Edifix will also add a comment with the DOI of the journal article, so that you can see the status.

Reference lists exported from Edifix to JATS comply with JATS4R recommendations for citing preprints.

Crossref linking and preprints

When linking preprint citations on Edifix, you may see inconsistent Crossref results. While Edifix does its best, its success depends on how Crossref resolves link queries; unfortunately, Crossref query results are inconsistent, and sometimes Crossref does not have complete metadata.

To illustrate, here’s example of an input reference and the “ideal” output, in which Edifix has added the preprint DOI to the reference and the comment gives information about the article’s final publication in another journal:

Input

1. Dunham I. FORGE: A tool to discover cell specific enrichments of GWAS associated SNPs in regulatory regions. bioRxiv. 2014

Output

1. Dunham I. FORGE: A tool to discover cell specific enrichments of GWAS associated SNPs in regulatory regions. bioRxiv. 2014 https://doi.org/10.1101/013045.

Crossref reports this preprint has been published as 10.12688/f1000research.6032.1 (Ref. 6 "Dunham, 2014")

In this case, Edifix can provide both DOIs, because publishers have provided all the necessary metadata and Crossref has correctly cross-referenced the two DOIs. 

However, we also see cases like the following example:

Input

2. Lou B, Li T, Zheng S, et al. Serology characteristics of SARS-CoV-2 infection since the exposure and post symptoms onset. medRxiv. Preprint posted March 27, 2020.

Output A (final publication info missing)

2. Lou B, Li T, Zheng S, et al. Serology characteristics of SARS-CoV-2 infection since the exposure and post symptoms onset. medRxiv. Preprint posted March 27, 2020. https://doi.org/10.1101/2020.03.23.20041707.  

Output B (preprint DOI missing)

2. Lou B, Li T, Zheng S, et al. Serology characteristics of SARS-CoV-2 infection since the exposure and post symptoms onset. medRxiv. Preprint posted March 27, 2020.

Crossref reports this preprint has been published as 10.1183/13993003.00763-2020 (Ref. 6 "Lou, Li, Zheng, et al.")

Outputs A and B come from two different Crossref query results, with identical input, submitted less than one minute apart. 

In Output A, Edifix has returned the correct DOI for the preprint, but because the preprint and the final publication aren’t linked in the Crossref database, Edifix can’t provide information about the final publication. 

In Output B, Edifix has instead returned the DOI for the published journal article in the Crossref database. Edifix is smart enough to recognize that this isn’t the correct DOI and inserts the comment instead, but because the preprint and the final publication aren’t linked in the database, Edifix can’t provide the preprint DOI.

Why now?

John Shaw of Sage recently commented that preprints are “a mature process that’s not at all mature.” We had originally hoped to have this update go live in April, but ran smack into John’s statement: we’ve found a more varied and inconsistent data set, both from authors and at Crossref, than we originally anticipated! Thanks to the work of our Development team, we’re now able to deploy a robust solution for preprint citations.

Questions? Need more information? Contact us at support@edifix.com


Link: https://www.edifix.com/blog/new-to-edifix-preprint-citation-parsing