会议专题

Generalized UDF for Analytics Inside Database Engine

Running analytics computation inside a database engine through the use of UDFs (User Defined Functions) has been investigated, but not yet be come a scalable approach due to several technical limitations. One limitation lies in the lack of generality for UDFs to express complex applications and to compose them with relational operators in SQL queries. Another limitation lies in the lack of systematic support for a UDF to cache relations initially for effi cient computation in multi-calls. Further, having UDF execution interacted effi ciently with query processing requires detailed system programming, which is often beyond the expertise of most application developers. To solve these problems, we extend the UDF technology in both semantic and system dimensions. We generalize UDF to support scalar, tuple as well as relation input and output, allow UDFs to be defined on the entire content of relations and allow the moderate-sized input relations to be cached in initially to avoid repeated retrieval. With such extension the generalized UDFs can be composed with other relational operators and thus integrated into queries naturally. Furthermore, based on the notion of invocation patterns, we provide focused system support for effi ciently interacting UDF execution with query processing. We have taken the open-sourced PostgreSQL engine and a commercial and proprietary parallel database engine as our prototyping vehicles; we illustrated the performance, modeling power and usability of the proposed approach with the experimental results on both platforms.

Meichun Hsu Qiming Chen Ren Wu Bin Zhang Hans Zeller

Palo Alto, California, USA Hewlett Packard Co. HP Labs, HP TSG

国际会议

11th International Conference,WAIM 2010(第十一届网络时代管理国际会议)

九寨沟

英文

742-754

2010-07-14(万方平台首次上网日期,不代表论文的发表时间)