Parallel and distributed document overlap detection on the web
2000 (English)In: Applied Parallel Computing: new paradigms for HPC in industry and academia ; 5th international workshop, Bergen, Norway, June 18 - 20, 2000 ; proceedings / [ed] Tor Sørevik, New York: Encyclopedia of Global Archaeology/Springer Verlag, 2000, 206-214 p.Conference paper (Refereed)
Proliferation of digital libraries plus availability of electronic documents from the Internet have created new challenges for computer science researchers and professionals. Documents are easily copied and redistributed or used to create plagiarised assignments and conference papers. This paper presents a new, two-stage approach for identifying overlapping documents. The first stage is identifying a set of candidate documents that are compared in the second stage using a matching-engine. The algorithm of the matching-engine is based on suffix trees and it modifies the known matching statistics algorithm. Parallel and distributed approaches are discussed at both stages and performance results are presented.
Place, publisher, year, edition, pages
New York: Encyclopedia of Global Archaeology/Springer Verlag, 2000. 206-214 p.
Lecture Notes in Computer Science, ISSN 0302-9743 ; 1947
IdentifiersURN: urn:nbn:se:ltu:diva-37303DOI: 10.1007/3-540-70734-4_25Local ID: b48a4250-d016-11dc-9ad7-000ea68e967bISBN: 978-3-540-41729-3 (print)OAI: oai:DiVA.org:ltu-37303DiVA: diva2:1010801
International Workshop on Applied Parallel Computing : 18/06/2000 - 20/06/2000
Upprättat; 2000; 20080131 (ysko)2016-10-032016-10-03Bibliographically approved