A Document Comparison Approach using Hybrid Keyword and Structured Full Text Vocabulary Searches
This paper proposes a systematic full text search on document using a combined keyword and structural similarity of documents under consideration. The approach operates in two steps. The first step uses a set of designated keywords to acquire potential desired documents by means of an open source tool. The second step builds a suffix tree of frequently used vocabulary to retrieve the most similar documents from the acquired documents. In so doing, variations on contextual matching of full text search can be mitigated, wherein the resulting performance turns out to be quite acceptable. The ultimate goal is to arrive at a platform independent full text search technique that can be realized. The benefits for this scheme are two folds. On the one hand, relevant document can be retrieved as close to the desired document as possible. On the other hand, suspect plagiarism can be identified to some extent, which is dependent on the effectiveness of the proposed approach with plenty of rooms for future improvement. The proposed work will eventually be put to real use for database retrieval in a small business enterprise.
full text search structural similarity suffix tre , contextual matching plagiarism
Kudachamai Boonsuk Peraphon Sophatsathit
Technopreneurship and Innovation Management Program, Graduate School Chulalongkorn University, Bangk Advanced Virtual and Intelligent Computing (AVIC)Center, Department of Mathematics, Faculty of Scien
国际会议
上海
英文
252-257
2011-03-11(万方平台首次上网日期,不代表论文的发表时间)