会议专题

Feature Selection Based File Type Identification Algorithm

Identifying the true type of an arbitrary file is very important in information security. Methods based on file extensions or magic numbers can be easily spoofed, while a more reliable way is based on analyzing the files binary content. We propose an algorithm to generate models for each file type based on analyzing the binary contents of a set of known input files by using n-gram analysis and design a novel feature selection evaluation function for extracting signatures from the models, then using the signatures to recognize the true type of unknown files. Our aim is not to use the structure and key words of any specific file types as this allows the approach to be applied to general file types. Experiments show that the proposed approach is promising especially when the feature selection evaluation function is applied.

file type identification feature selection gram frequency distribution n-gram analysis

Ding Cao Junyong Luo Meijuan Yin Huijie Yang

Zhengzhou Information Science and technology Institute Zhengzhou, Henan,450002,China Zhengzhou Information Science and technology Institute Zhengzhou, Henan, 450002, China

国际会议

2010 IEEE International Conference on Intelligent Computing and Intelligent Systems(2010 IEEE 智能计算与智能系统国际会议 ICIS 2010)

厦门

英文

58-62

2010-10-29(万方平台首次上网日期,不代表论文的发表时间)