会议专题

Data Extraction and Cleansing of Semi-Structured Chinese Texts

The rapid growth of data mining generates an everincreasing demand for automatic information extraction from Chinese texts. However, existing approaches in this domain focus on wellstructured Chinese texts and therefore have difficulties in dealing with semistructured Chinese texts which do not conform to strict syntactic structures. We propose in this paper an approach to semiautomatic data extraction and cleansing for these texts. Preliminary experimental results show that, with modest manual intervention, it can effectively extract information from raw semistructured Chinese texts collected from ebusiness applications.

dataextraction datacleansing semi-structured text Chinese manual intervention

Wei-Heng ZHU Shun LONG

Dept. of Computer Science, Jinan University, Guangzhou, P.R.China Guangdong Emergency Technology Research Center of Risk Evaluation and Prewarning on Public Network S

国际会议

2011 International Conference on Business Management and Electronic Information(2011商业管理与电子信息国际学术会议 BMEI2011)

广州

英文

1-4

2011-05-13(万方平台首次上网日期,不代表论文的发表时间)