Array Aware Training/Pruning: Methods for Efficient Forward Propagation on Array-based Neural Network Accelerators

dc.contributor.author Chitty-Venkata, Krishna
dc.contributor.author Somani, Arun
dc.contributor.author Somani, Arun
dc.contributor.department Electrical and Computer Engineering
dc.date 2020-08-04T21:32:44.000
dc.date.accessioned 2021-02-25T17:12:05Z
dc.date.available 2021-02-25T17:12:05Z
dc.date.copyright Wed Jan 01 00:00:00 UTC 2020
dc.date.embargo 2019-01-01
dc.date.issued 2020-01-01
dc.description.abstract <p>Due to the increase in the use of large-sized Deep Neural Networks (DNNs) over the years, specialized hardware accelerators such as Tensor Processing Unit and Eyeriss have been developed to accelerate the forward pass of the network. The essential component of these devices is an array processor which is composed of multiple individual compute units for efficiently executing Multiplication and Accumulation (MAC) operation. As the size of this array limits the amount of DNN processing of a single layer, the computation is performed in several batches serially leading to extra compute cycles along both the axes. In practice, due to the mismatch between matrix and array sizes, the computation does not map on the array exactly. In this work, we address the issue of minimizing processing cycles on the array by adjusting the DNN model parameters by using a structured hardware array dependent optimization. We introduce two techniques in this paper: Array Aware Training (AAT) for efficient training and Array Aware Pruning (AAP) for efficient inference. Weight pruning is an approach to remove redundant parameters in the network to decrease the size of the network. The key idea behind pruning in this paper is to adjust the model parameters (the weight matrix) so that the array is fully utilized in each computation batch. Our goal is to compress the model based on the size of the array so as to reduce the number of computation cycles. We observe that both the proposed techniques results into similar accuracy as the original network while saving a significant number of processing cycles (75%).</p>
dc.description.comments <p>This is a manuscript of a proceeding published as Chitty-Venkata, Krishna Teja, and Arun K. Somani. "Array Aware Training/Pruning: Methods for Efficient Forward Propagation on Array-based Neural Network Accelerators." In <em>2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)</em> (2020): 37-44. DOI: <a href="https://doi.org/10.1109/ASAP49362.2020.00016" target="_blank">10.1109/ASAP49362.2020.00016</a>. Posted with permission.</p>
dc.format.mimetype application/pdf
dc.identifier archive/lib.dr.iastate.edu/ece_conf/92/
dc.identifier.articleid 1095
dc.identifier.contextkey 18792987
dc.identifier.s3bucket isulib-bepress-aws-west
dc.identifier.submissionpath ece_conf/92
dc.identifier.uri https://dr.lib.iastate.edu/handle/20.500.12876/93951
dc.language.iso en
dc.source.bitstream archive/lib.dr.iastate.edu/ece_conf/92/2020_SomaniArun_ArrayAware.pdf|||Sat Jan 15 02:29:57 UTC 2022
dc.source.uri 10.1109/ASAP49362.2020.00016
dc.subject.disciplines Databases and Information Systems
dc.subject.disciplines Systems and Communications
dc.subject.disciplines Systems Architecture
dc.subject.keywords Array
dc.subject.keywords Deep Neural Networks
dc.subject.keywords Accelerators
dc.subject.keywords Training
dc.subject.keywords Pruning
dc.title Array Aware Training/Pruning: Methods for Efficient Forward Propagation on Array-based Neural Network Accelerators
dc.type article
dc.type.genre conference
dspace.entity.type Publication
relation.isAuthorOfPublication edede50a-4e31-44f3-a7c7-a06dc8db42c2
relation.isOrgUnitOfPublication a75a044c-d11e-44cd-af4f-dab1d83339ff
File
Original bundle
Now showing 1 - 1 of 1
Name:
2020_SomaniArun_ArrayAware.pdf
Size:
589.16 KB
Format:
Adobe Portable Document Format
Description: