Elsevier and other publishers’ ability to detect “self-plagiarism” is an instance of text mining the world’s scientific literature. Over at two vision researcher mailing lists, there is much irritation at being asked to remove sentences that duplicate sentences that one wrote in previous papers, to describe for example the methodology of a study.
Tom Wallis pointed out that the automated text duplication checks also can be useful for detecting data duplication and fraud. Unfortunately it cannot easily be used for that by others – Elsevier shuts down independent researchers who use their journal subscriptions to investigate fraud (http://onsnetwork.org/chartgerink/2015/11/16/elsevier-stopped-me-doing-my-research/ ; http://www.nature.com/news/text-mining-block-prompts-online-response-1.18819).
Text mining the scientific literature could yield thousands of discoveries, about both fraud and new connections between molecules, genes, and diseases, but it can’t be done when publishers like Elsevier own the content and are trying to monetize it all for themselves (https://blogs.ch.cam.ac.uk/pmr/2017/07/11/text-and-data-mining-overview). “Self-plagiarism” also puts publishers at legal risk as a result of them publishing all our articles under restrictive copyright – it can be a copyright violation for them to publish text that happens to be identical to an earlier paper by the same author that happens to have been published by a different publisher. In an email from a publisher to Professor Peter Tse, the issue was framed as protecting the author but there was also this sentence: “Another issue to be borne in mind is the matter of copyright in extensive text duplication.”
Thus the traditional system of publishers owning the copyright to our work is both preventing new discoveries (which has to wait until the publishers find a way to use text mining to maintain or increase their profits) and creating ridiculous busywork for ourselves. Yesterday I attended a university press publishing conference where Kevin Stranack of demo’ed Open Journal Systems version 3, which has already been released and looks significantly easier to use than ScholarOne/Manuscript Central, the system that expensive subscription journals use. The existence of OJS3 allows the creation of journals at very low cost (it already underpins thousands of journals, such as Glossa, which flipped from Elsevier) Unfortunately I seem to be the only researcher at the conference, but I’m tweeting about it and will add some related information to FairOA.org.