会议专题

INVESTIGATION OF INTERNET SYSTEM USER BEHAVIOUR USING CLUSTER ANALYSIS

The method of the investigation of information web system users activity using a clustering method is presented in the paper. On the basis of a web server log, anonymous sessions are determined in the form of a 65 dimensional vector, where dimensions represent individual web system pages. Each dimension comprises the value of a measure of user interest in a page during a given session. This value is calculated as a ratio of time user spent visiting a given page to the total time of a session. Then the whole set of sessions is clustered using HCM (Hard C-Means) algorithm. The resulting clusters are assumed as the user activity patterns and among them clusters dominated by a page are selected as those where the user interest value exceeds a given threshold value e.g. 50 per cent. The sessions of named users, registered in the system, are determined using an application log of user activity. The frequencies of named user sessions, comprised by individual clusters, are calculated for a given period of time e.g. one month. The user activity can be assessed by analyzing frequencies obtained. For example, the user behavior can be regarded as deviated from normal pattern when the frequency of a session in a cluster dominated by a page is below a determined threshold value e.g. 10 per cent The method was evaluated using data from a cadastral web system exploited in an extranet.

Web system clustering user activity server log HCM algorithm

DARIUSZ KR(O)L MICHAL (S)CIGAJLO BOGDAN TRAWI(N)SKI

Wroclaw University of Technology, Institute of Applied Informatics, Wyb.Wyspianskiego 27, 50-370 Wroclaw, Poland

国际会议

2008 International Conference on Machine Learning and Cybernetics(2008机器学习与控制论国际会议)

昆明

英文

3408-3412

2008-07-12(万方平台首次上网日期,不代表论文的发表时间)