INTRODUCTION
What constitutes plagiarism? What are the methods to detect plagiarism? How do “plagiarism
detection tools” assist in detecting plagiarism? What is the difference between plagiarism
and similarity index? These are probably the most common questions regarding plagiarism
that many research experts in scientific writing are usually faced with, but a definitive
answer to them is less known to many. According to a report published in 2018, papers
retracted for plagiarism have sharply increased over the last two decades, with higher
rates in developing and non-English speaking countries.1 Several studies have reported
similar findings with Iran, China, India, Japan, Korea, Italy, Romania, Turkey, and
France amongst the countries with highest number of retractions due to plagiarism.1
2
3
4 A study reported that duplication of text, figures or tables without appropriate
referencing accounted for 41.3% of post-2009 retractions of papers published from
India.5 In Pakistan, Journal of Pakistan Medical Association started a special section
titled “Learning Research” and published a couple of papers on research writing skills,
research integrity and scientific misconduct.6
7 However, the problem has not been adequately addressed and specific issues about
it remain unresolved and unclear. According to an unpublished data based on 1,679
students from four universities of Pakistan, 85.5% did not have a clear understanding
of the difference between similarity index and plagiarism (unpublished data). Smart
et al.8 in their global survey of editors reported that around 63% experienced some
plagiarized submissions, with Asian editors experiencing the highest levels of plagiarized/duplicated
content. In some papers, journals from non-English speaking countries have specifically
discussed the cases of plagiarized submissions to them and have highlighted the drawbacks
in relying on similarity checking programs.9
10
11 The cases of plagiarism in non-English speaking countries have a strong message
for honest researchers that they should improve their English writing skills and credit
used sources by properly citing and referencing them.12
Despite aggregating literature on plagiarism from non-Anglophonic countries, the answers
to the aforementioned questions remain unclear. In order to answer these questions,
it is important to have a thorough understanding of plagiarism and bring clarity to
the less known issues about it. Therefore, this paper aims to 1) define plagiarism
and growth in its prevalence as well as literature on it; 2) explain the difference
between similarity and plagiarism; 3) discuss the role of similarity checking tools
in detecting plagiarism and the flaws on completely relying on them; and 4) discuss
the phenomenon called Trojan citation. At the end, suggestions are provided for authors
and editors from developing countries so that this issue maybe collectively addressed.
PLAGIARISM
Defining plagiarism and its prevalence in manuscripts
To begin with, plagiarism maybe defined as “when somebody presents the published or
unpublished work of others, including ideas, scholarly text, images, research design
and data, as new and original rather than crediting the existing source of it.”13
The common types of plagiarism, including direct, mosaic, paraphrasing, intentional
(covert) or unintentional (accidental) plagiarism, and self-plagiarism have been discussed
in previous reviews.14
15
16
Evidence suggests that the first paper accused for plagiarism was published in 1979
and there has been a substantial growth in the cases of plagiarism over time.1
2
3
4
5
8
17 Previous studies have pointed that plagiarism is prevalent in developing and non-English
speaking countries but the occurrence of plagiarism in developed countries suggests
that it is rather a global problem.1
2
3
4
18
19
20 As of today (1 April 2020), the search conducted in Retraction Database (http://retractiondatabase.org/RetractionSearch.aspx?)
for papers retracted for plagiarism found 2,280 documents. Similarly, Scopus search
for plagiarism in title of journal articles found 2,159 results. This suggests that
the papers retracted for plagiarism are in fact higher than the papers published on
this issue. However, what we see now may not necessary be true i.e., the cases of
plagiarism might be higher than we know. Certainly, database search for papers tagged
for plagiarism is limited to indexed journals only, which keeps non-indexed journals
(both low-quality and deceptive journals) out of focus.5
21 Moreover, journal coverage may vary from one database to the other as reported
in a recent paper on research dissemination in South Asia.22 Therefore, both the prevalence
of plagiarism and literature published on it as reported by database search are most
likely “understated as of today.”5
Reasons for plagiarism: lack of understanding and poor citing practices
Although reasons for plagiarism are complex, previous papers have suggested possible
causes for plagiarism by authors.16
23
24
25
26 One of the major but less known reason for this might be that the students, naïve
researchers, and even some faculty members either lack clarity about what constitutes
plagiarism or are unable to differentiate similarity index versus plagiarism.24
26
27 For example, a recent online survey conducted on the participants in the AuthorAID
MOOC on Research Writing found that 84.4% of the survey participants were unaware
of the difference between similarity index and plagiarism, though almost all of them
had reported having an understanding of plagiarism.24 The same paper reported that
one in three participants admitted that they had plagiarized at some point during
their academic career.24 Therefore, it is important to have clarity about what constitutes
plagiarism and the difference between similarity index and plagiarism so that the
increasing rates of plagiarism could be deterred.
The ‘existing source’ or ‘original source’ in the definition of plagiarism refers
to the main (primary) source and not the source (secondary) from where the author
extracts the information. For example, someone cites a paper for a passage on mechanism
of how exercise affects sleep but the cited paper aims to determine the prevalence
of sleep disorders and exercise level rather than the mechanistic association. A thorough
evaluation finds that the cited paper had used the text from another review paper
that talked about the mechanisms relating sleep with exercise behavior. This phenomenon
of improper secondary (or indirect) citations may be common among students and novice
researchers, particularly from developing countries, and should be discouraged.27
SIMILARITY INDEX
Plagiarism vs. similarity index and the role of similarity checking tools
Plagiarism as defined above refers to the intentional (covert) or unintentional (accidental)
theft of published or unpublished intellectual property (i.e., words or ideas), whereas
similarity index refers to “the extent of overlap or match between an author's work
compared to other existing sources (books, websites, student thesis, and research
articles) in the databases of similarity checking tools.”9
24 The advancements in information technology has helped researchers get help from
various freely available (i.e., Viper, eTBLAST/HelioBLAST, PlagScan, PlagiarismDetect,
Antiplagiat, Plagiarisma, DupliChecker) and subscription-based (i.e., iThenticate,
Turnitin, Similarity Check) similarity checking tools.8
24 Many journal editors use iThenticate and/or Similarity Check (Crossref) for screening
submitted manuscripts for similarity detection whereas Turnitin is commonly used by
universities and faculty to assess text similarity in students' work; however, there
is a fairness issue that not every journal or university, particularly those from
developing countries, can afford to pay for using these subscription-based services.28
For instance, an online survey found that only about 18% participants could use Turnitin
through their university subscription.24 Another problem is the way these tools are
commonly referred to as i.e., plagiarism detection tools, plagiarism checking software,
or plagiarism detection programs. However, based on the function they perform, it
would be appropriate to call them differently, such as similarity checking tools,
similarity checkers, text-matching tools, or simply text-duplicity detection tools.5
8
23 This means that these tools help locate matching or overlapping text (similarity)
in submitted work, without directly flagging up plagiarism.24
Taking Turnitin as an example, these tools reflect the text similarity through color
codes, each linked to an online source of it; details for this have been described
elsewhere.23
28 Journal editors, universities and some organizations consider text above specific
cutoff values for the percentage of similarity as problematic. According to a paper,
5% or less text similarity (overlap of the text in the manuscript with text in the
online literature) is acceptable to some journal editors, while others might want
to put the manuscript under scrutiny if the text similarity is over 20%.29
30 Another paper observed that journal editors tend to reject a manuscript if text
similarity is above 10%.31 The study on participants completing the AuthorAID MOOC
on Research Writing also found that some participants reported that their institutions
consider text similarity of less than 20% as acceptable.24 As an example, the guidelines
of the University Grants Commission of India allow for similarity up to 10% as acceptable
or minor (Level 0), but anything above is categorized into different levels (based
on the percentages), each with separate list of repercussions for students and researchers.32
This approach might miss the cases where the acceptable similarity of 10% comes from
a single source, especially if the editors relied on the numbers only. In addition,
this approach has the potential for punishing authors who have not committed plagiarism
at all. To illustrate this, the randomly written text presented in Fig. 1 would be
considered plagiarism based on the rule of cutoff values. Some authors opine that
text with over four consecutive words or a number of word strings should be treated
as plagiarized.28
33 This again is not a good idea as the text “the International Physical Activity
Questionnaire was used to measure …” would be same in several papers, but this is
definitely not plagiarism because the methodology of different papers on the same
topic could be similar; so, the decision should not be based on the numbers reflected
by similarity detection tools.28 Therefore, it would be prudent not to set any cutoff
values for text similarity as it will lead to a slippery slope (“a course of action
that seems to lead inevitably from one action or result to another with unintended
consequences”–defined by Merriam-Webster Dictionary) and give “a sense of impunity
to the perpetrators.”32
Fig. 1
Turnitin report for text similarity based on a randomly written text (on April 2,
2020). The author of this paper has access to Turnitin through the University and
not to iThenticate. Therefore, Turnitin was used as an example in Fig. 1.
Drawbacks of similarity checking tools
There are a few drawbacks on completely relying on the similarity checking tools.
First, these tools are not foolproof and might miss the incidents of translational
plagiarism and figure plagiarism.24 Translational plagiarism is the most invisible
type of copying in non-Anglophone countries where an article published in languages
other than English is copied (with or without minor modifications) and published in
an English journal or vice versa.10 This is indeed extremely difficult type of plagiarism
to detect, and different approaches (e.g., use of Google translator) to address it
have been recently reported.34
35 Nevertheless, there might be some cases where this practice maybe acceptable, such
as publishing policy papers (see “Identifying predatory or pseudo-journals” – this
paper was published in International Journal of Occupational and Environmental Medicine,
National Medical Journal of India, and Biochemia Medica in 2017 by authors affiliated
with World Association of Medical Editors (WAME) – or “The revised guidelines of the
Medical Council of India for academic promotions: Need for a rethink” – this paper
was published in over ten journals during 2016 by four journal editors and endorsed
by members (not all) of the Indian Association of Medical Journal Editors, for example).
Second, text similarity in some parts of manuscript (i.e., methods and results) should
be weighed differently from other sections (i.e., introduction and discussions) and
its conclusions.31 In addition, based on the personal experience of the author of
this paper, some individuals might use a sophisticated technique to avoid detection
of high similarity through the use of inappropriate synonyms, jargon, and deliberate
grammatical and structural errors in the text of the manuscript. Third, plagiarism
of ideas may be missed by these tools as they can only detect plagiarism of words.23
32 Therefore, similarity checking tools tend to underestimate plagiarized text or
sometimes overestimate non-plagiarized material as problematic (Fig. 1).24
36 It should be noted that these tools serve as only an aid to determine suspected
instances of plagiarism and the text of the manuscript should always be evaluated
by experts, so “a careful human cannot be replaced.”31
37 A few papers published in the Journal of Korean Medical Science have presented
the examples where plagiarized content was missed by similarity checking tools and
later noticed after a careful examination of the text.9
10 Finally, plagiarism of unpublished work cannot be detected by these tools as they
are limited to online sources only.23 This is particularly important in the context
of developing countries where research theses/dissertations of students are not deposited
in research repositories, and where commercial, predatory editing and brokering services
exist.10
38 For example, the research repository of the Higher Education Commission of Pakistan
allows deposition of doctoral theses only, and less than five universities (out of
over 150) across the country have a research repository allowing for deposition of
scholarly content.38 Recently some strange trend of predatory editing and brokering
services has emerged that offer clones of previously published papers or unpublished
work to non-Anglophone or some lazy authors demanding quick and easy route to publications
for promotion and career advancement.10 Although plagiarism of unpublished work would
not be easy for experts to detect, this may be possible through their previous experience
and scholarly networks.
TROJAN CITATION: PERSONAL EXPERIENCE
A recent experience worth discussion in context to plagiarism comes in the shape of
the Trojan citation where someone “makes reference to a source one time to in order
to evade detection (by editors and readers) of bad intentions and provide cover for
a deeper, more pervasive plagiarism.”39 This practice is particularly common in those
with an intent of deceiving the readers and playing with the system. A few months
ago, the author of this paper was invited to review a manuscript on predatory publishing
by a journal. The content of the manuscript appeared suspicious but was not labelled
“plagiarized” during the first round of the review. However, during the second round,
it was noticed that this was a case of Trojan citation where the author(s) cited the
main source for a minor point and copied the major part of the manuscript from a paper
published in Biochemia Medica (a Croatian journal) with slight modification in the
content.40 The editor of the journal was informed about this and the manuscript was
rejected further processing. This example suggests that careful human intervention
by experts is required to highlight the cases of plagiarism.
CONCLUSION
In conclusion, what we know about the growth in the prevalence of plagiarism may be
‘just the tip of the iceberg’. Therefore, collective contribution from authors, reviewers,
and editors, particularly from Asia-Pacific region, is required. Authors from the
Asia-Pacific region and developing countries, with an expertise on this topic, should
play their role by supporting journal editors and through their mentorship skills.
Furthermore, senior researchers should encourage and help their honors and master
students to publish their unpublished work before it gets stolen by commercial, brokering
agencies. They should also work in close collaboration with universities and organizations
related with higher education in countries where this issue is not properly addressed,
and should facilitate education and training sessions on plagiarism as previous evidence
suggests that workshops and online training sessions may be helpful.5 On the other
hand, journal editors from Asia-Pacific region and developing countries should not
judge the manuscripts solely on the basis of percentage of similarity as reflected
by similarity checking services. They should have a database of their own where manuscripts
about plagiarism in scientific writing, for example, should be sent for review to
the experts on this subject. As journal editors may not be experts in all fields,
networking and seeking help from experts would be helpful in avoiding the cases of
plagiarism in the future. It would be appropriate that the journal editors and the
trainee editors, particularly from the resource-limited countries, are educated about
the concept of scientific misconduct and the advancement in knowledge around this
area. Moreover, journal editors should publish and publically discuss the cases of
plagiarism as a learning experience for others. The Journal of Korean Medical Science
has used this approach regarding cases of plagiarism, which other journals from the
region are encouraged to adopt.9
10 Likewise, a paper discussing case scenarios of salami publication (i.e., “a distinct
form of redundant publication which is usually characterized by similarity of hypothesis,
methodology or results but not text similarity”) serves as a good example of how journal
editors may facilitate authors to utilize their mentorship skills and support journals
in educating researchers.41 There should be strict penalties on cases of plagiarism,
and safety measures for security of whistleblowers should be in place and be ensured.
By doing so, evil and lazy authors who bypass the system would be punished and honest
authors would be served. Thus, the take-home message for editors from Asia-Pacific
region is that a collective effort and commitment from authors, reviewers, editors
and policy-makers is required to address the problem of plagiarism, especially in
the developing and non-English speaking countries.