Array Aware Training/Pruning: Methods for Efficient Forward Propagation on Array-based Neural Network Accelerators

Chitty-Venkata, Krishna; Somani, Arun; Somani, Arun

Array Aware Training/Pruning: Methods for Efficient Forward Propagation on Array-based Neural Network Accelerators

File

2020_SomaniArun_ArrayAware.pdf (589.16 KB)

Date

2020-01-01

Authors

Chitty-Venkata, Krishna

Somani, Arun

Authors

Person

Somani, Arun

Senior Associate Dean

Organizational Units

Organizational Unit

Electrical and Computer Engineering

Department

Electrical and Computer Engineering

Abstract

Due to the increase in the use of large-sized Deep Neural Networks (DNNs) over the years, specialized hardware accelerators such as Tensor Processing Unit and Eyeriss have been developed to accelerate the forward pass of the network. The essential component of these devices is an array processor which is composed of multiple individual compute units for efficiently executing Multiplication and Accumulation (MAC) operation. As the size of this array limits the amount of DNN processing of a single layer, the computation is performed in several batches serially leading to extra compute cycles along both the axes. In practice, due to the mismatch between matrix and array sizes, the computation does not map on the array exactly. In this work, we address the issue of minimizing processing cycles on the array by adjusting the DNN model parameters by using a structured hardware array dependent optimization. We introduce two techniques in this paper: Array Aware Training (AAT) for efficient training and Array Aware Pruning (AAP) for efficient inference. Weight pruning is an approach to remove redundant parameters in the network to decrease the size of the network. The key idea behind pruning in this paper is to adjust the model parameters (the weight matrix) so that the array is fully utilized in each computation batch. Our goal is to compress the model based on the size of the array so as to reduce the number of computation cycles. We observe that both the proposed techniques results into similar accuracy as the original network while saving a significant number of processing cycles (75%).

Comments

This is a manuscript of a proceeding published as Chitty-Venkata, Krishna Teja, and Arun K. Somani. "Array Aware Training/Pruning: Methods for Efficient Forward Propagation on Array-based Neural Network Accelerators." In 2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP) (2020): 37-44. DOI: 10.1109/ASAP49362.2020.00016. Posted with permission.

Copyright

Wed Jan 01 00:00:00 UTC 2020

Collections

Conference Proceedings and Presentations

Full item page