Publishers prioritize “self-plagiarism” policing over allowing new discoveries

Elsevier and other publishers’ ability to detect “self-plagiarism” is an instance of text mining the world’s scientific literature. Over at two vision researcher mailing lists, there is much irritation at being asked to remove sentences that duplicate sentences that one wrote in previous papers, to describe for example the methodology of a study.

Tom Wallis pointed out that the automated text duplication checks also can be useful for detecting data duplication and fraud. Unfortunately it cannot easily be used for that by others – Elsevier shuts down independent researchers who use their journal subscriptions to investigate fraud (  ;

Text mining the scientific literature could yield thousands of discoveries, about both fraud and new connections between molecules, genes, and diseases, but it can’t be done when publishers like Elsevier own the content and are trying to monetize it all for themselves ( “Self-plagiarism” also puts publishers at legal risk as a result of them publishing all our articles under restrictive copyright – it can be a copyright violation for them to publish text that happens to be identical to an earlier paper by the same author that happens to have been published by a different publisher. In an email from a publisher to Professor Peter Tse, the issue was framed as protecting the author but there was also this sentence: “Another issue to be borne in mind is the matter of copyright in extensive text duplication.”

Thus the traditional system of publishers owning the copyright to our work is both preventing new discoveries (which has to wait until the publishers find a way to use text mining to maintain or increase their profits) and creating ridiculous busywork for ourselves.  Yesterday I attended a university press publishing conference where Kevin Stranack of demo’ed Open Journal Systems version 3, which has already been released and looks significantly easier to use than ScholarOne/Manuscript Central, the system that expensive subscription journals use. The existence of OJS3 allows the creation of journals at very low cost (it already underpins thousands of journals, such as Glossa, which flipped from Elsevier) Unfortunately I seem to be the only researcher at the conference, but I’m tweeting about it and will add some related information to



2 thoughts on “Publishers prioritize “self-plagiarism” policing over allowing new discoveries

  1. This self-plagiarism stuff drives me crazy. Scientists aren’t producing timeless works of literature, they’re just describing the results of some experiments. They should absolutely be able to repeat themselves. That said, most publishers, Elsevier journals in particular, tune their plagiarism detection software to avoid flagging this kind of stuff unnecessarily. If you have a problem with a specific editor, feel free to let me know & I’ll look into it for you.

    I agree that text mining is a very valuable technique. Did you know there’s also a text mining API for all Elsevier papers? It’s here:
    Chris has said it would have been sufficient for his work, but he stood on principle, which I respect. An XML dump would have made it easier for him, but let’s not pretend that he didn’t have options.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s