会议专题

A New Processing Approach for Unstructured Data in Power System based on MapReduce Technology

  With expansion of the power grid, the measuring equipment has been increasingly digital, integrated and comprehensive.The information collected and processed by the power software system tends to diversity and complexity, which not only includes the traditional structured data such as SCADA data, but also contains unstructured data, such as wave recording,work logs, video etc.Such data is not descripted and managed in a unified data structure or paradigm, and is stored distribution of more scattered and independent currently.Decentralization of unstructured data is very inconvenient for comprehensive utilization.Traditional relational database usually stores these data in compressed blob way, in which inefficient compression and decompression are introduced in the process of accessing to the data.To break through the limitation of the traditional method handling unstructured data generated from power monitoring and control system, a new data processing approach is proposed based on MapReduce computation model of Hadoop, by designing multi-index structure, independent storage pattern and parallel computing architecture.In multi-index key structure designing, we use various combinations of fixed length fields to form total row key.The fields contain higher search frequency column as type mask, generation time, the source of the data,quality code.Column family is divided into the original data, data description and processed data family group.The content of unstructured data stored in original data family group, always as input of data analysis or mining algorithm.Data description family group contains additional information of unstructured data,which ensure correct parser object generated.Processed data family saves all of analysis results, and designed to adapt to different algorithm output.According with data model, we designed unstructured data parallel processing mechanism, in which data table is treated as both data source and data sink.That makes it possible that the data processing algorithm is carried out on the spot within the database.The approach is suitable for the characteristics of unstructured data formats, the unified management of extracting,transforming and loading for various types unstructured electric power data is being intensified.At the same time, the analyzing and mining algorithm of unstructured data can be customized and implemented in the infrastructure for different needs of advanced application.Experiments show that, processing performance of massive unstructured data can be significantly improved by using our method.

Analyzing and mining Hadoop MapReduce Parallel computing Storage pattern Unstructured data

Xiaojun Li Shaohua Jiao Hong Cao Liqiang Zhang Zhenghao Gao Yang Yi

Guizhou Electric Power Research Institute Beijing Sifang Automation Co., Ltd

国际会议

第六届现代电力系统自动化和保护国际会议

南京

英文

502-506

2015-09-21(万方平台首次上网日期,不代表论文的发表时间)