Technical Report TR712:
Bo Joel Svensson, Mary Sheeran, Ryan R. Newton
A Language for Nested Data Parallel Design-space Exploration on GPUs
(Aug 2014), 16 pages pages
Graphics Processing Units (GPUs) offer potential for very high performance; they are also rapidly evolving. Obsidian is an embedded language (in Haskell) for implementing high performance kernels to be run on GPUs. We would like to have our cake and eat it too; we want to raise the level of abstraction beyond CUDA code and still give the programmer control over the details relevant kernel performance. To that end Obsidian includes guaranteed elimination of intermediate arrays and predictable space/time costs, while also providing array functions that are polymorphic across different levels of the GPUs' hierarchical structure, providing a limited form of nested data parallelism. We walk through case-studies that demonstrate how to use Obsidian for rapid design exploration or auto-tuning, resulting in better performance than hand-tuned kernels in an existing GPU language.
- Available as: