Array Aware Training/Pruning: Methods for Efficient Forward Propagation on Array-based Neural Network Accelerators

Thumbnail Image
Date
2020-01-01
Authors
Chitty-Venkata, Krishna
Somani, Arun
Major Professor
Advisor
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Authors
Person
Somani, Arun
Senior Associate Dean
Research Projects
Organizational Units
Journal Issue
Is Version Of
Versions
Series
Department
Electrical and Computer Engineering
Abstract

Due to the increase in the use of large-sized Deep Neural Networks (DNNs) over the years, specialized hardware accelerators such as Tensor Processing Unit and Eyeriss have been developed to accelerate the forward pass of the network. The essential component of these devices is an array processor which is composed of multiple individual compute units for efficiently executing Multiplication and Accumulation (MAC) operation. As the size of this array limits the amount of DNN processing of a single layer, the computation is performed in several batches serially leading to extra compute cycles along both the axes. In practice, due to the mismatch between matrix and array sizes, the computation does not map on the array exactly. In this work, we address the issue of minimizing processing cycles on the array by adjusting the DNN model parameters by using a structured hardware array dependent optimization. We introduce two techniques in this paper: Array Aware Training (AAT) for efficient training and Array Aware Pruning (AAP) for efficient inference. Weight pruning is an approach to remove redundant parameters in the network to decrease the size of the network. The key idea behind pruning in this paper is to adjust the model parameters (the weight matrix) so that the array is fully utilized in each computation batch. Our goal is to compress the model based on the size of the array so as to reduce the number of computation cycles. We observe that both the proposed techniques results into similar accuracy as the original network while saving a significant number of processing cycles (75%).

Comments

This is a manuscript of a proceeding published as Chitty-Venkata, Krishna Teja, and Arun K. Somani. "Array Aware Training/Pruning: Methods for Efficient Forward Propagation on Array-based Neural Network Accelerators." In 2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP) (2020): 37-44. DOI: 10.1109/ASAP49362.2020.00016. Posted with permission.

Description
Keywords
Citation
DOI
Copyright
Wed Jan 01 00:00:00 UTC 2020