Efficient processing of vision kernels and deep neural networks on reconfigurable computing architectures

Thumbnail Image
Qasaimeh, Murad
Major Professor
Jones, ‪Phillip H.
Zambreno, Joseph
Rover, Diane
Tyagi, Akhilesh
Darr, Matt
Committee Member
Journal Title
Journal ISSN
Volume Title
Research Projects
Organizational Units
Journal Issue
Is Version Of
Electrical and Computer Engineering
Computer vision algorithms empowered with the recent advances in deep learning play a fundamental role in solving many problems that seemed impossible just a decade ago. However, the computational complexity and memory footprint of these algorithms keep increasing to either enhance accuracy or to solve more complex problems. This places a heavy load on computing platforms used to run these algorithms. Moreover, the exponential growth in computing platforms' capability has slowed due to the saturation in Moore’s law, which makes the problem even more challenging. In this dissertation, I propose hardware/software co-optimization techniques to improve the performance of the most commonly used components in vision pipelines and deep neural networks: Sliding window structures, histogram computing, and convolution operations. The main contributions of this work include the following: (1) Proposing a modified sliding window architecture for reducing Block RAMs resources on FPGA. I propose a new sliding window architecture that takes advantage of most of an images' information residing in low frequencies to reduce the required on-chip memory. The architecture uses a novel sliding window compression algorithm that can be efficiently implemented in hardware and gives comparable compression ratios to the state-of-the-art compression algorithms. It also has the flexibility of changing its compression ratio based on a threshold parameter. (2) Implementing a run-time programmable architecture for computing histogram-based feature descriptors. I propose a configurable hardware architecture that has the flexibility to compute different types of descriptors in real-time. The architecture is capable of computing several feature description algorithms using a single datapath. The architecture is configurable in terms of patch sizes, number of regions, and number of bins per region. Using optimization techniques, I was able to reduce the complexity of computing 2D histograms from O(n2) to O(n). (3) Evaluating the energy-efficiency of computer vision kernels and DNNs on FPGAs. I benchmark representative vision kernels, complete vision pipelines and DNNs for the ARM57 CPU, Nvidia Jetson TX2 (GPU-accelerated) and Xilinx UltraScale (FPGA-accelerated) and give insight into the reasons behind the observed run-time, power, and energy consumption performance for each evaluated platform. (4) Proposing a hardware-aware pruning algorithm that generates geometrically structured sparse weights. The locations of non-zero weights follow pseudo random patterns generated by Linear Feedback Shift Registers (LFSRs). I also propose an FPGA-based inference engine for sparse convolution. It uses pruned models generated by my approach to speed up the convolution computation. This way copying sparse weight indices from off-chip memory is avoided by computing these indices in real-time. The proposed techniques in this work show an improvement in the performance of sliding window structures, histogram computing, and convolution operations in terms of their hardware resource utilization, programmability, and runtime and bandwidth requirements.
Subject Categories