Jaskanwar Singh
Aspiring Novelist
Gigabyte Also Demos G1.Assassin 2 X79 Motherboard at IDF 2011 - Softpedia
In the proposed CPU-assisted GPGPU, after the CPU launches a GPU program, it executes a pre-execution program, which is generated automatically from the GPU kernel using the proposed compiler algorithms and contains memory access instructions of the GPU kernel for multiple threadblocks. The CPU pre-execution program runs ahead of GPU threads because (1) the CPU pre-execution thread only contains memory fetch instructions from GPU kernels and not floating-point computations, and (2) the CPU runs at higher frequencies and exploits higher degrees of instruction-level parallelism than GPU scalar cores. The researchers also leverage the prefetcher at the L2-cache on the CPU side to increase the memory traffic from CPU. As a result, the memory accesses of GPU threads hit in the L3 cache and their latency can be drastically reduced. Since the pre-execution is directly controlled by user-level applications, it enjoys both high accuracy and flexibility. Engineers' experiments on a set of benchmarks show that our proposed pre-execution improves the performance by up to 113% and 21.4% on average.