I am currently a post-doctoral research fellow in Computer Architecture and VLSI group, Dept. of Electrical Engineering and Computer Science, Harvard University. I am working with Profs. David Brooks and Gu-Yeon Wei. I completed my PhD in Computer Science from Columbia University, NY, where I worked with Prof. Steven Nowick in asynchronous circuits and systems lab.
My research interests include
Heterogeneous systems-on-chip (SoCs) for artificial intelligence (AI) applications, GALS (globally-asynchronous locally-synchronous) systems, asynchronous circuits, hardware accelerator design,
system-level design and optimization, networks-on-chip (NoCs), hardware-assisted acceleration using FPGAs, neuromorphic computing, and computer-aided design (CAD) for VLSI.
I am on the job market!
33 Oxford Street
Cambridge, MA 02138
Research Challenges for AI SoCs
An excellent EETimes article summarizes the three main challenges faced by AI SoCs. My Ph.D. and post-doc research has made significant contributions in these three areas:
1) Interconnect: New high-performance low-power network architectures and topologies are required that can meet the bandwidth demands of the AI workloads. Support for new kinds of traffic patterns such as multicast and broadcast is also critical.
[I introduced novel strategies and network architectures to efficiently support multicast/broadcast in my Ph.D. dissertation]
2) Scaling to large chips: These AI chips can be big, where timing closure and clock distribution (and clock power) become a nightmare. Use of asynchronous or GALS techniques can efficiently mitigate these issues.
[I developed novel asynchronous NoC solutions with efficient multicast support (demonstrated on commercial FPGAs) in my Ph.D. thesis]
3) Memory: Frequent expensive off-chip memory accesses degrade system performance and power. Novel architectures such as in/near-memory computing and new memory technologies are required to overcome this bottleneck.
[As part of my post-doc research, I introduced a comprehensive methodology to efficiently integrate accelerators with the system's memory hierarchy]