Kalhan Koul, Jackson Melchert, Kavya Sreedhar, Leonard Truong, Gedeon Nyengele, Keyi Zhang, Qiaoyi Liu, Jeff Setter, Po-Han Chen, Yuchen Mei, Maxwell Strange, Ross Daly, Caleb Donovick, Alex Carsello, Taeyoung Kong, Kathleen Feng, Dillon Huff, Ankita Nayak, Rajsekhar Setaluri, James Thomas, Nikhil Bhagdikar, David Durst, Zachary Myers, Nestan Tsiskaridze, Stephen Richardson, Rick Bahr, Kayvon Fatahalian, Pat Hanrahan, Clark Barrett, Mark Horowitz, Christopher Torng, Fredrik Kjolstad, Priyanka Raina
With the slowing of Moore’s law, computer architects have turned to domain-specific hardware specialization to continue improving the performance and efficiency of computing systems. However, specialization typically entails significant modifications to the software stack to properly leverage the updated hardware. The lack of a structured approach for updating both the compiler and the accelerator in tandem has impeded many attempts to systematize this procedure. We propose a new approach to enable flexible and evolvable domain-specific hardware specialization based on coarse-grained reconfigurable arrays (CGRAs). Our agile methodology employs a combination of new programming languages and formal methods to automatically generate the accelerator hardware and its compiler from a single source of truth. This enables the creation of design-space exploration frameworks that automatically generate accelerator architectures that approach the efficiencies of hand-designed accelerators, with a significantly lower design effort for both hardware and compiler generation. Our current system accelerates dense linear algebra applications, but is modular and can be extended to support other domains. Our methodology has the potential to significantly improve the productivity of hardware-software engineering teams and enable quicker customization and deployment of complex accelerator-rich computing systems.
中文翻译:
随着摩尔定律的放缓,计算机架构师已经转向特定领域的硬件专业化,以继续提高计算系统的性能和效率。然而,专业化通常需要对软件堆栈进行重大修改,以正确利用更新的硬件。缺乏用于同时更新编译器和加速器的结构化方法阻碍了许多将该过程系统化的尝试。我们提出了一种新方法,可以基于粗粒度可重构阵列 (CGRA) 实现灵活且可演化的特定领域硬件专业化。我们的敏捷方法结合了新的编程语言和形式化方法,从单一来源自动生成加速器硬件及其编译器。这样就可以创建设计空间探索框架,自动生成接近手工设计加速器效率的加速器架构,同时显着降低硬件和编译器生成的设计工作量。我们当前的系统加速了密集线性代数应用程序,但它是模块化的,可以扩展以支持其他领域。我们的方法有可能显着提高硬件-软件工程团队的生产力,并能够更快地定制和部署复杂的加速器丰富的计算系统。我们当前的系统加速了密集线性代数应用程序,但它是模块化的,可以扩展以支持其他领域。我们的方法有可能显着提高硬件-软件工程团队的生产力,并能够更快地定制和部署复杂的加速器丰富的计算系统。我们当前的系统加速了密集线性代数应用程序,但它是模块化的,可以扩展以支持其他领域。我们的方法有可能显着提高硬件-软件工程团队的生产力,并能够更快地定制和部署复杂的加速器丰富的计算系统。