Professor Debora Weber-Wulff and her team from the HTW Berlin have tested various plagiarism detection systems. Instead of using an artificially created test data set, Guttenberg’s doctoral thesis was used. The detection results of the Plagiarism Detection Systems were, as expected, pretty poor.

PlagAware: Initially 28% on the first 159 pages, however this included a lot of garbage such as pastebin material. After we removed this and the GuttenPlag links, the amount went to 68% before the report disappeared completely. We have not been able to resubmit, it breaks off with an error.
iThenticate: 40%
Ephorus: 5%! Only 10 possible sources found, of these 3 were GuttenPlag and one a duplicate
PlagScan: 15.9%
Urkund: 21%

The results of the experiment are published in iX 6/11. Here you can find a summary.

So far Plagiarism Detection Systems rely solely on text analysis, but text-based detection systems struggle, as study results show, to identify paraphrased forms of plagiarism, idea plagiarism and translation-plagiarism.

In our paper “Comparative Evaluation of Text- and Citation-based Plagiarism Detection Approaches using GuttenPlag”, we have evaluated whether analyzing the citations of a document could help to increase detection rates.

A preprint of our paper (to be published in June 13th at the JCDL 11 conference in Ottawa), in which we evaluate the potential of citation-based plagiarism detection systems using Guttenberg’s doctoral thesis, can be found here:

  • [PDF] [DOI] B. Gipp, N. Meuschke, and J. Beel, “Comparative Evaluation of Text- and Citation-based Plagiarism Detection Approaches using GuttenPlag,” in Proceedings of 11th annual international ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’11), Ottawa, Canada, 2011.
    [Bibtex]
    @inproceedings{Gipp11,
      title        = {{C}omparative {E}valuation of {T}ext- and {C}itation-based {P}lagiarism {D}etection {A}pproaches using {G}utten{P}lag},
      author       = {{G}ipp, {B}ela and {M}euschke, {N}orman and {B}eel, {J}oeran},
      year         = 2011,
      booktitle    = {{P}roceedings of 11th annual international {ACM}/{IEEE}-{CS} {J}oint {C}onference on {D}igital {L}ibraries ({JCDL}'11)},
      publisher    = {ACM},
      address      = {Ottawa, Canada},
      doi          = {10.1145/1998076.1998124},
      url          = {https://doi.org/10.1145/1998076.1998124},
      topic        = {pd}
    }

The abstract:

Various approaches for plagiarism detection exist. All are based on more or less sophisticated text analysis methods such as string matching, fingerprinting or style comparison. In this paper a new approach called Citation-based Plagiarism Detection is evaluated using a doctoral thesis, in which a volunteer crowd-sourcing project called GuttenPlag identified substantial amounts of plagiarism through careful manual inspection. This new approach is able to identify similar and plagiarized documents based on the citations used in the text. It is shown that citation-based plagiarism detection performs significantly better than text-based procedures in identifying strong paraphrasing, translation and some idea plagiarism. Detection rates can be improved by combining citation-based with text-based plagiarism detection.

The developed algorithms for the citation analysis and pattern matching will be presented at the DocEng conference in Mountain View in September.

  • [PDF] [DOI] B. Gipp and N. Meuschke, “Citation Pattern Matching Algorithms for Citation-based Plagiarism Detection: Greedy Citation Tiling, Citation Chunking and Longest Common Citation Sequence,” in Proceedings of the 11th ACM symposium on Document engineering (DocEng ’11), Mountain, View, CA, USA, 2011.
    [Bibtex]
    @inproceedings{Gipp11c,
      title        = {{C}itation {P}attern {M}atching {A}lgorithms for {C}itation-based {P}lagiarism {D}etection: {G}reedy {C}itation {T}iling, {C}itation {C}hunking and {L}ongest {C}ommon {C}itation {S}equence},
      author       = {{G}ipp, {B}ela and {M}euschke, {N}orman},
      year         = 2011,
      month        = {Sep.},
      booktitle    = {{P}roceedings of the 11th {ACM} symposium on {D}ocument engineering ({D}oc{E}ng '11)},
      publisher    = {ACM},
      address      = {Mountain, View, CA, USA},
      doi          = {10.1145/2034691.2034741},
      isbn         = {978-1-4503-0863-2},
      url          = {https://doi.org/10.1145/2034691.2034741},
      topic        = {pd}
    }