ESCAL @ Angels
ESCAL @ Angels

The first-ever baseball of ESCAL together after pandemic, again at Angels Stadium. As you can see from the photo, we are all masked and maintaining the highest standard of COVID-19 prevention! We witnessed the historical moment when Shohei Ohtani made his 20th steal base and Upton made the 1,000th RBI. It’s a rally for Angels. However, being a Padres fan, it’s apparently not a good game from Hung-Wei’s perspective.

OpenUVR received outstanding paper award in RTAS 2021!
OpenUVR received outstanding paper award in RTAS 2021!

OpenUVR received “outstanding paper award” in RTAS 2021!

You may access the preprint in arXiv
You can now start building your OpenUVR-based VR system by clone our GitHub repo at
Varifocal Storage chosen as IEEE Micro Top Picks!
Varifocal Storage chosen as IEEE Micro Top Picks!
Our paper, Dynamic Multi-Resolution Data Storage, authored by Yu-Ching Hu and Hung-Wei Tseng, two former ESCAL students, Murtuza Lokhandwala and Te I, has been selected for IEEE Micro’s Top Picks from the 2020 Computer Architecture Conferences. Top Picks is an annual special edition of IEEE Micro magazine that acknowledges the 10-12 most significant research papers. The authors found a smarter use of in-storage processors and present a holistic system design to address the emerging I/O and data preprocessing bottleneck in modern heterogeneous computer architectures based on approximate hardware. Prior to the recognition from Micro Top Picks, the same project also received ACM/IEEE MICRO’s best paper honorable mention in 2019 and Facebook’s research award in 2018.
ESCAL proudly releases GPTPU!
ESCAL proudly releases GPTPU!

TPUs can do more than just AI/ML!!!

Very excited to share with you our latest open-source project on using Google’s Edge TPUs for general-purpose computing! Our project releases the power of matrix processing on Edge TPUs to any application/problem that you could use matrix algebras to implement/solve (as long as you can tolerate a little bit of inaccuracy due to Edge TPU’s limited precision).

You can now clone our project from the following GitHub repo

Enjoy and let us know what matrix application/matrix you can come up with!

Our paper is again nominated for MICRO’s best paper!
Our paper is again nominated for MICRO’s best paper!

Our paper “NDS: N-Dimensional Storage” nominated as a best paper candidate in MICRO-54! Another best paper nomination since Varifocal Storage!

NDS presents a storage interface and an architecture that more efficiently support modern hardware accelerators in high-dimensional algorithms! Please check out our paper here

We also provide an artifact on our github repo


Extreme Storage & Computer Architecture Laboratory

With the rapid growth of dataset sizes but limited improvement of high-performance computers, we need to revisit the existing programming and execution models to efficiently utilize all system components. In modern computers, lots of deficiencies in applications are related to data management and movements. The vision of Extreme Storage & Computer Architecture Laboratory is to revolutionary change the way how people think about programming and computing today — using a data-centric perspective in programming instead of the conventional computing-centric approach. ESCAL conducts research in systems and computer architecture with focus on storage systems, parallel processing, high-performance computing, programming languages and runtime systems.

Research Projects

Accelerating non-AI/ML applications using AI/ML accelerators

The explosive demand on AI/ML workloads drive the emergence of AI/ML accelerators, including commercialized NVIDIA Tensor Cores and Google TPUs. These AI/ML accelerators are essentially matrix processors and are theoretically helpful to any application with matrix operations. This project bridges the missing system/architecture/programming language support in democratizing AI/ML accelerators. As matrix operations are conventionally inefficient, this project also revises the core algorithm in compute kernels to better utilize operators of AI/ML accelerators. With this project, ESCAL envisions ourselves to lead the next trend of a revolution — similar to the one happened on GPUs. You may now try our most recent GPTPU project from the GitHub repo:

Building intelligent data storage & I/O devices

As parallel computer architectures significantly shrinking the execution time in compute kernels, the performance bottlenecks of applications shift to the rest of part of execution, including data movement, object deserialization/serialization as well as other software overheads in managing data storage. To address this new bottleneck, the best approach is to not move data and endow storage devices with new roles. Morpheus is one of the very first research project that implements this concept in real systems. We utilize existing, commercially available hardware components to build the Morpheus-SSD. The Morpheus model not only speeds up a set of heterogeneous computing applications by 1.32x, but also allows these applications to better utilize emerging data transfer methods that can send data directly to the GPU via peer-to-peer to further achieve 1.39x speedup. Summarizer further provides mechanisms to dynamically adjust the workload between the host and intelligent SSDs, making more efficient use of all computing units in a system and boost the performance of big data analytics. This line of research also helps ESCAL receive Facebook research award, 2018 and MICRO TopPicks in 2020.

Efficient storage system for heterogeneous servers

Although high-performance, non-volatile memory technologies and network devices significantly improve the speed of supplying data to heterogeneous computing units, the performance of these devices are still far behind the capabilities of heterogeneous computing units. For example, modern SSDs can read more than 3GB of data per second, but GPUs can process more than 17GB of data for database aggregation operations within the same period of time. As result, the heterogeneous computing units are under-utilized. We will revisit the design of existing runtime systems to transparently improve the utilization of system components, potentially leading to speedup or better energy-efficiency.

Optimizing the I/O system software stack for emerging applications

With hardware accelerators improving the latency in computation, the system software stack that were traditionally underrated in designing applications becomes more critical. In ESCAL, we focus on those underrated bottlenecks to achieve significant performance improvement without using new hardware. The most recent example is the OpenUVR system, where we eliminate unnecessary memory copies and allow the VR system loop to complete within 14 ms latency with just modern desktop PC, existing WiFi network links, raspberry Pi 4b+ and an HDMI compatible head mount display.



Graduate students

Yunan “Andrew” Zhang
Boram Jung


Xindi Li (C.S., M.S., 2018. Now at Bloomberg)

Chao Huang (C.S., M.S., 2018)

Zackary Allen (C.S., B.S., 2018. Now at LexisNexis)

Alec Rohloff (C.S., B.S., 2018.)

Te I (C.S., M.S., 2018. Now at Google)

Vaibhava Lakshmi (ECE, M.S., 2018. Dell EMC)

Murtuza Taher Lokhandwala (ECE, M.S., 2018. Apple)

Mahesh Bonagiri(ECE, M.S., 2018. Nvidia)

Joshua Okrend

Stefan O’Neil