A SURVEY OF AUTOMATIC URDU LANGUAGE PROCESSING
Most of the research in last few decades has focused on automatic Natural Language processing (NLP) in English,European and East Asian Languages. But unfortunately South Asian Languages especially Urdu have received less attention.In this paper we present a survey regarding classification of Urdu language. The main goal of this survey is to present briefly about the material available on Urdu NLP, with the aim to also allow researchers to develop new techniques. So this survey contains the initial attempts in the development of Urdu language. First we give brief description about Urdulanguage and its written system. Secondly, we introduce Urdu language corpus. Then we compare this corpus to other East Asian languages and discuss which type of problems one has to face while building Urdu corpus. Thirdly, we summarize different linguistic analysis of domain: Part-of-Speech tagging,Parsing and named entity recognition. Overall goal of this paper is to provide a structure which would provide the foundation for future research Urdu language processing techniques.
Urdu language written system Urdu corpus Linguistics analysis of domain
WAQAS ANWAR XUAN WANG XIAO-LONG WANG
School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, 518055, China
国际会议
2006 International Conference on Machine Learning and Cybernetics(IEEE第五届机器学习与控制论坛)
大连
英文
4489-4494
2006-08-13(万方平台首次上网日期,不代表论文的发表时间)