Analysis of VSM algorithm for judging the role of paragraph

An improved document layout checking algorithm based on VSM (Vector Space Model) which can judge the character of paragraph is proposed,it designs similarity matrixes for qualitative components of format vectors,so that the qualitative components can be quantified by similarity value which simplifies the process of transforming components into the same kind of variables,also analyzes and finds the suitable nondimensionalization methods for format vectors and the suitable vector similarity measuring methods for format vectors which match with these nondimensionalization methods.Experiments show that,compared with prior algorithms,the proposed algorithm can effectively improve the precision rate and recall rate of character judging,and is suitable for vector similarity computation problems which have many different kinds of components in one vector.
document layout checking VSM nondimensionalization similarity measurement document understanding
Peng Xin Li Ning
Computer School,Beijing Information Science and Technology University,Beijing 100101,China
国内会议
北京
英文
1-18
2014-11-01(万方平台首次上网日期,不代表论文的发表时间)