Managing the Google Web 1T 5-gram Data Set
This paper describes how the Google Web 1T 5- gram data set, contributed by Google Inc., can be stored so that it can be used eciently with respect to time. We present an ecient way of accessing all the 5-grams for a specic word of interest from the stored les. We measure the maximum access and processing eciency achievable for any word of interest. We also compare results (access time and memory requirements) on the task of accessing all the 5-grams for a list of words, on both the processed and the original organization of the data set.
Google web 1T n-gram 5-grams
Aminul ISLAM Diana INKPEN
Department of Computer Science, SITE University of Ottawa Ottawa, ON, Canada
国际会议
大连
英文
1-5
2009-09-24(万方平台首次上网日期,不代表论文的发表时间)