会议专题

A new method of page standardization based on DOM

  With the rapid development of the Internet,information as well as websites boomed.And,being differentiated in style,structure or content,it is unable to get the information from different pages using the same model,while it is really a waste of time to search each line of the page to find useful information because of noises.That makes arranging all the information from a page to build a DOM tree for search a wise choice firstly because it raises the possibility of searching accurately.What is more,converting a web page into a tree helps identify the main frame of the page.On the other hand,unreadable codes,which are caused by invalid transformation between languages,is a barrier separating people apart from information on websites of other districts of the world.Our work is aimed at solving the listed problems to make information from all around the world accessible while convenient to extract.

DOM Tree unreadable codes charset page standardization

Weicheng Ma Yong Fan Wenqian Shang Fengyan Wu

School of Computer,Communication University of China,Beijing,China

国际会议

the 2nd International Conference on Electronic & Mechanical Engineering and Information Technology (EMEIT-2012)(2012年电机工程与信息技术国际会议)

沈阳

英文

288-292

2012-09-26(万方平台首次上网日期,不代表论文的发表时间)