Extraction and Visualization of Numerical and Named Entity Information from a Large Number of Documents

摘要：

We have developed a system that can semi automatic-ally extract numerical and named entity sets from a large number of Japanese documents and can create various kinds of tables and graphs. In our experiments, our system has semiautomatically created approximately 300 kinds of graphs and tables at precisions of 0.2-0.8 with only two hours of manual preparation from a two-year stack of newspapers articles. Note that these newspaper articles contained a large quantity of data, and all of them could not be read or checked manually in such a short amount of time. From this perspective, we concluded that our system is useful and convenient for extracting information from a large number of documents.

关键词： Visualization numerical information,named entity,graph,table

作者: Masaki MURATA Masakazu IWATATE Koji ICHII Qing MA Tamotsu SHIRADO Toshiyuki KANAMARU Kentaro TORISAWA

作者单位: NICT Seika,Kyoto,Japan NAIST Ikoma,Nara,Japan Hiroshima University Hiroshima,Japan Ryukoku University Otsu,Shiga,Japan Kyoto University Sakyo,Kyoto,Japan

会议类型: 国际会议

会议名称: The 2008 IEEE International Conference on Natural Language Processing and Knowledge Engineering(IEEE NLP-KE 2008)(2008IEEE自然语言处理与知识工程国际会议)

会议地点: 北京

会议语种:英文

在线出版日期: 2008-10-19（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Extraction and Visualization of Numerical and Named Entity Information from a Large Number of Documents