An Effective Feature Representation of web log data by Leveraging Byte Pair Encoding and TF-IDF
Web log data analysis is important in intrusion detection.Various machine learning techniques have been applied.However,com-pared to abundant researches on machine learning,ways to extract features from log data are still under research.In this paper,we present an effective feature extraction approach by leveraging Byte Pair Encoding(BPE)and Term Frequency-Inverse Document Fre-quency(TF-IDF).We have applied this approach on various down-stream machine learning algorithms and proved its usefulness.
Web Log Data Analysis Features representation BPE TF-IDF Ma-chine Learning
Junlang Zhan Xuan Liao Yukun Bao Lu Gan Zhiwen Tan Mengxue Zhang Ruan He Jialiang Lu
Shanghai Jiao Tong University Shanghai,China Tencent Shenzhen,China
国际会议
2019国图灵大会(ACM Turing Celebration conference-China 2019 )
成都
英文
607-612
2019-05-17(万方平台首次上网日期,不代表论文的发表时间)