会议专题

Analysis of VSM algorithm for judging the role of paragraph

  An improved document layout checking algorithm based on VSM (Vector Space Model) which can judge the character of paragraph is proposed,it designs similarity matrixes for qualitative components of format vectors,so that the qualitative components can be quantified by similarity value which simplifies the process of transforming components into the same kind of variables,also analyzes and finds the suitable nondimensionalization methods for format vectors and the suitable vector similarity measuring methods for format vectors which match with these nondimensionalization methods.Experiments show that,compared with prior algorithms,the proposed algorithm can effectively improve the precision rate and recall rate of character judging,and is suitable for vector similarity computation problems which have many different kinds of components in one vector.

document layout checking VSM nondimensionalization similarity measurement document understanding

Peng Xin Li Ning

Computer School,Beijing Information Science and Technology University,Beijing 100101,China

国内会议

2014全国文档信息处理学术会议

北京

英文

1-18

2014-11-01(万方平台首次上网日期,不代表论文的发表时间)