A Dedicated Adaptive Loop Pre-fetch Mechanism for Stream-like Application
For the stream-like applications with high-bandwidth and low latency, optimizing the memory latency can effectively improve the QoS. In this paper, we propose a dedicated adaptive loop pre-fetch mechanism to reduce the memory latency and also improve the pre-fetch accuracy. In the mechanism, when a loop sequences is detected, the stream pre-fetch engine can adaptively initiate the pre-fetch operation and store the return data into the on-chip stream buffers. The pre-fetch engine consists of loop sequences recognition, stream buffer FIFOs, address calculation ALU. A hardware engine is implemented and integrated into a processor to verify the mechanism. When the processor with the pre-fetch engine is running a regular loop sequences, it can save 2/3 to 1/2 of the time spent on memory latency. Also the mechanism can alleviate the cache pollution and the cache thrash.
Xiao-Ping Huang Xiao-Ya Fan Yu-Hui Chen Xiang-Dong He
Computer School, Northwestern Polytechnical University, Xian 710072, China
国际会议
上海
英文
575-577
2010-11-01(万方平台首次上网日期,不代表论文的发表时间)