|Tipo di tesi||Tesi di laurea magistrale|
|Titolo||Implementazione e Testing di Reti Neurali Convoluzionali su Architetture Embedded per Automotive|
|Titolo in inglese||Implementation and Validation of Convolutional Neural Networks on Embedded Automotive Platforms|
|Struttura||Dipartimento di Ingegneria "Enzo Ferrari"|
|Corso di studi||Ingegneria Informatica (D.M.270/04)|
|Data inizio appello||2018-07-17|
|Disponibilità||Accessibile via web (tutti i file della tesi sono accessibili)|
L'obiettivo di questo lavoro di tesi è quello di testare e valutare le principali Convolutional Neural Networks (CNNs) per Objects Detection e Image Classification sulle più moderne e performanti piattaforme embedded per Autonomous Driving, al fine di svolgere dei benchmarks riguardanti le performance ottenute in termini di frames-per-seconds (FPS) e potenza dissipata (Watt). In particolare i tests sono stati svolti su system-on-chip (SoC) Nvidia Tegra X2, per mezzo della development board Nvidia Jetson TX2, una embedded board dotata di GPU Nvidia Pascal e processore composto da quattro ARM Cortex-A57 e due Nvidia Denver. I benchmarks effettuati su questa architettura sono stati poi comparati con una board dotata di SoC Ultrascale+, che rappresenta al momento lo stato dell'arte per quanto riguarda le logiche programmabili. In questo caso si è fatto uso della board Xilinx Zynq Ultrascale+ ZCU102, dotata di SoC Ultrascale+, GPU Mali-400 MP2 e processore quad cores, ARM Cortex-A53 assieme ad un dual cores ARM Cortex-R5.
The aim of this thesis work is to test and evaluate the main Convolutional Neural Networks (CNNs) for Objects Detection and Image Classification on the most modern and performing embedded platforms for Autonomous Driving and Industry 4.0, in order to perform benchmarks regarding the performances obtained in terms of frames-per-seconds (FPS) and power dissipated (Watt). In particular the tests are performed on system-on-chip (SoC) Nvidia Tegra X2, through the Nvidia Jetson TX2 development board, an embedded board with a Nvidia Pascal GPU and a processor composed by four ARM Cortex-A57 and two Nvidia Denver. The benchmarks performed on this architecture were then compared with a board equipped with an Ultrascale+ SoC, which currently represents the state of the art in terms of programmable logic. In this case we used Xilinx Zynq Ultrascale+ ZCU102 board, which is equipped with an Ultrascale+ SoC, a Mali-400 MP2 GPU and a quad cores processor, composed by four ARM Cortex-A53 paired with two ARM Cortex-R5. From the point of view of the frameworks tested, on GPU, we used Caffe for testing classification networks, and for implementing CNNs for objects detection we used Darknet and tkDNN. On FPGA, the frameworks ZynqNet and PipeCNN have been used respectively to implement CNNs for image classification and xfDNN and CHaiDNN for testing objects detection networks. In this work we show benchmarks regarding some popular CNNs models, in particular the well known AlexNet and ZynqNet, a model derived from SqueezeNet, which is a reduced and regularized CNN that is well suited for implementations on small accelerators and programmable logics. Finally regarding objects detection networks, we tested You-only-look-once (YOLO) and its variants: YOLO-Small and Tiny-YOLO, proposed by J.Redmon et al. For the performance evaluation of these neural networks, we measured the number of images processed per second (FPS) and the average power dissipated (Watt) by the board and its SoC. The power measurements were calculated through a Digital Multimeter (used as Amperometer), through which the current absorbed in the inference phase was measured; through these samples we tried to approximate the power dissipated by the board and its SoC. These measurements were sampled varying the accelerator's frequency (GPU and FPGA in this case) and varying the processor's frequency. We also tried some experiments in which we measured the power consumption of the Jetson TX2 by varying the supply voltage (VDD) of the board itself. From the results obtained it is clear that at the moment (with the models and frameworks tested) GPUs are still faster than FPGAs for doing inference, on the other hand the FPGAs are resulted extremely efficient when the energy dissipated by the system becomes a constraint to be taken into account.