2nd Workshop on Democratizing Domain-Specific Accelerators (WDDSA 2023)

In conjunction with the 56th IEEE/ACM International Symposium on Microarchitecture (MICRO 56) on 10/29/2023 1pm — 5pm @ Toronto

PDF Version

The evolution of computer architecture always starts with designing domain-specific ones and then turning them into general-purpose ones. For example, GPUs were just accelerators for computer graphics two decades ago, but now, GPUs are the most widely used general-purpose vector processors. Recent computer architecture trends again fall back to the wave of designing domain-specific accelerators (DSA). However, research projects have successfully used emerging accelerators for applications beyond their original target domains. Inspired by recent research projects and the story of GPGPUs (General-purpose computing on GPUs), this workshop intends to bring together experts from academia and industry to share their efforts in democratizing domain-specific accelerators. Through the presented work, WDDSA would like to explore the potential to lead the renaissance of general-purpose computing on emerging DSAs. 

While we are interested in work that supports general-purpose computing on recent DSAs, we also encourage submissions in general on DSAs and their infrastructure. This workshop is interested in but is not limited to the following topics.

  1. Novel use cases of an accelerator where applications are outside accelerators’ original application domains
  2. Systems, programming, and software for democratizing domain-specific accelerators.
  3. Architectural support for democratizing domain-specific accelerators.
  4. Performance/power/energy evaluation/analysis of democratizing domain-specific accelerators
  5. Implications to future “democratized” accelerator design.

 This workshop invites three types of presentations.

  1. Track 1: Original research papers. WDDSA welcomes papers on projects working on innovative ideas with preliminary results. WDDSA works together with IEEE Computer Architecture Letters (IEEE CAL) to invite top papers in this track to publish in IEEE CAL.
  2. Track 2: Published research papers with artifacts available. The submission can be based on an already published work (published within 12 months upon the submission deadline). WDDSA provides a platform for these papers to promote their artifacts, allowing the community to use and extend existing projects. The presentation may consider including a live demo.
  3. Track 3: Industry insights. WDDSA also welcomes papers on industry projects, encouraging the industry to have conversations with the academia.


1:00-1:10pOpening remark
1:10p-1:40pNandita Vijaykumar (U Toronto)

Architectural support for more efficient robotics tasks on resource-constrained devices

Abstract:  Many popularly employed vision and robotics applications such as perception, SLAM, and online mapping have significant computational demands from the underlying compute hardware and memory systems. In this talk, I will discuss the performance and energy inefficiencies of some of these applications and their implications on resource-constrained devices such as mobile phones, drones, mobile robots, health devices, etc. I will then discuss some of our recent research on enabling faster and more efficient implementations for online mapping and event-based perception using hardware-software codesign. 

Bio: Nandita Vijaykumar is an Assistant Professor at the Department of Computer Science at the University of Toronto. She is also affiliated with the Vector Institute for Artificial Intelligence.  She received her Ph.D. from Carnegie Mellon University. She has previously worked for AMD, Intel, Microsoft, and Nvidia. Her research interests lie at the intersection of computer architecture/compilers/systems, and computer vision/robotics/ML.  She is the recipient of the Connaught New Researcher Award, the Benjamin Garver Lamme Fellowship, is a Qualcomm Fellowship Finalist, and was inducted into the ISCA Hall of Fame.
1:40p-2:00pSupporting a Virtual Vector Instruction Set on a Commercial Compute-in-SRAM Accelerator
Courtney Golden, Dan Ilan, Caroline Huang, Niansong Zhang, Zhiru Zhang, and Christopher Batten
2:00p-2:20pUDIR: Towards a Unified Compiler Framework for Reconfigurable Dataflow Architectures
Nikhil Agarwal, Mitchell Fream, Souradip Ghosh, Brian C. Schwedock, Nathan Beckmann
2:20-2:40pThe Case for a Cross-Layer Architecture-VLSI Energy Abstraction
Sabrina Yarzada, Matthew Conn, and Christopher Torng
2:40p-3:00pSystem Virtualization for Neural Processing Units
Yuqi Xue, Yiqi Liu, and Jian Huang
3:30p-4:00pMichael Pellauer (NVIDIA)
Symphony: Democratizing Accelerators via Hierarchical Heterogeneous Processing
When a new, valuable workload domain like Deep Learning emerges, it is natural to supplement general-purpose programmable MIMD compute platforms like CPUs and GPUs with specialized domain-specific hardware to accelerate efficient execution of that domain. Historically, there have been two orthogonal approaches to adding this functionality: (A) add new functional units, dispersedand to every programmable core/SM and exposed as fine-grained instructions tightly coupled to the program counter – as in NVIDIA’s Tensor Cores. Or, (B) add new large discrete accelerators in a heterogeneous SoC fashion, disjoint and decoupled from the programmable hardware with a coarse granularity of interaction —  as in NVIDIA’s DLA hardware. In this talk we propose Symphony: a hybrid programmable/specialized architecture built around the notion of decoupled medium-grained specialized hardware blocks dispersed throughout the memory hierarchy of a MIMD GPU or CPU. The key insight is that these specialized reconfigurable units are aimed not only at roofline floating-point computations, but at supporting data orchestration features such as address generation, data filtering, and sparse metadata processing. We name this approach hierarchical heterogeneous processing(HHP), as we provision each level of the memory hierarchy with co-located computation bandwidth inversely proportional to the buffer size (i.e., less compute at the last-level buffer, increasing more at each level until the traditional leaves at the L1). We demonstrate that Symphony can match non-programmable ASIC performance on sparse tensor algebra and provide 31× improved runtime and 44× improved energy over a comparably provisioned GPU for this historically challenging application domain.
Dr. Michael Pellauer is a Senior Research Scientist at NVIDIA in the Architecture Research Group (2015-present). His research interest is building domain-specific accelerators, with a special emphasis on deep learning and sparse tensor algebra. Prior to NVIDIA, he was a member of the VSSAD group at Intel (2010-2015), where he performed research and advanced development on customized spatial accelerators. Dr. Pellauer holds a Ph.D. from the Massachusetts Institute of Technology in Cambridge, Massachusetts (2010), a Master’s from Chalmers University of Technology in Gothenburg, Sweden (2003), and a Bachelor’s from Brown University in Providence, Rhode Island (1999), with a double-major in Computer Science and English.
4:00p-4:20pTowards an Accelerator for Differential and Algebraic Equations Useful to Scientists
Jonathan Garcia-Mallen, Shuohao Ping, Alex Miralles-Cordal, Ian Martin, Mukund Ramakrishnan, Yipeng Huang
4:20p-4:40pTCUDB: Accelerating Database with Tensor Processors
Yu-Ching Hu, Yuliang Li and Hung-Wei Tseng
4:40p-5:00pTensorCV: Accelerating Non-AI/ML Stages in Computing Vision Pipelines using Tensor Processors
Dongho Ha, Woo Won Ro, Hung-Wei Tseng

Submission Guidelines

WDDSA has two tracks of paper submissions and each have different submission guidelines. Please read the guidelines carefully.

  • Track 1: Research papers.
    • Papers should be 4 pages in PDF format, following the template of IEEE CAL https://www.computer.org/csdl/journal/ca/write-for-us/15055.
    • Please submit through https://mc.manuscriptcentral.com/cal and check “WDDSA” as the attribute before 9/15/2023
      • Create an account in IEEE CAL’s submission site.
      • After login, click “Author” from the menu bar
      • Fill in your details, and during Step 3: Attributes, please ensure you select “Workshop on Democratizing Domain-Specific Accelerators (WDDSA)”
      • Others are the same as regular CAL submissions
  • Track 2 & 3: Published & industry papers.
    • Please provide a 2-page summary that covers the high-level ideas of the presentation in PDF format. For published work, please also indicate where the paper was published.
    • Please submit through https://wddsa2023.hotcrp.com/

Submission Website

Important Dates

  • Abstract deadline: 9/8/2023 9/15/2023
  • Full paper deadline: 9/15/2023
  • Notification of acceptance: 9/29/2023
  • Camera-ready deadline: 10/20/2023


  • Yufei Ding (University of California, San Diego)
  • Christopher Torng (Stanford University)
  • Po-An Tsai (NVIDIA Research)
  • Hung-Wei Tseng (University of California, Riverside)