会议专题

Managing the Google Web 1T 5-gram Data Set

This paper describes how the Google Web 1T 5- gram data set, contributed by Google Inc., can be stored so that it can be used eciently with respect to time. We present an ecient way of accessing all the 5-grams for a speci c word of interest from the stored les. We measure the maximum access and processing eciency achievable for any word of interest. We also compare results (access time and memory requirements) on the task of accessing all the 5-grams for a list of words, on both the processed and the original organization of the data set.

Google web 1T n-gram 5-grams

Aminul ISLAM Diana INKPEN

Department of Computer Science, SITE University of Ottawa Ottawa, ON, Canada

国际会议

International Conference on Natural Language Processing and Knowledge Engineering(IEEE自然语言处理与知识工程国际会议 IEEE NLP-KE 2009)

大连

英文

1-5

2009-09-24(万方平台首次上网日期,不代表论文的发表时间)