A SURVEY OF AUTOMATIC URDU LANGUAGE PROCESSING

摘要：

Most of the research in last few decades has focused on automatic Natural Language processing (NLP) in English,European and East Asian Languages. But unfortunately South Asian Languages especially Urdu have received less attention.In this paper we present a survey regarding classification of Urdu language. The main goal of this survey is to present briefly about the material available on Urdu NLP, with the aim to also allow researchers to develop new techniques. So this survey contains the initial attempts in the development of Urdu language. First we give brief description about Urdulanguage and its written system. Secondly, we introduce Urdu language corpus. Then we compare this corpus to other East Asian languages and discuss which type of problems one has to face while building Urdu corpus. Thirdly, we summarize different linguistic analysis of domain: Part-of-Speech tagging,Parsing and named entity recognition. Overall goal of this paper is to provide a structure which would provide the foundation for future research Urdu language processing techniques.

关键词： Urdu language written system Urdu corpus Linguistics analysis of domain

作者: WAQAS ANWAR XUAN WANG XIAO-LONG WANG

作者单位: School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, 518055, China

会议类型: 国际会议

会议名称: 2006 International Conference on Machine Learning and Cybernetics(IEEE第五届机器学习与控制论坛)

会议地点: 大连

会议语种:英文

页码: 4489-4494

在线出版日期: 2006-08-13（万方平台首次上网日期，不代表论文的发表时间）

会议专题

A SURVEY OF AUTOMATIC URDU LANGUAGE PROCESSING