Difference between revisions of "Munir2013"

From ACES

(Import from BibTeX)
 
m (Default pdf)
 
(One intermediate revision by the same user not shown)
Line 5: Line 5:
|keywords=Embedded systems, High-performance, Many-core, optimization, Performance per watt, Tiled many-core architecture
|keywords=Embedded systems, High-performance, Many-core, optimization, Performance per watt, Tiled many-core architecture
|abstract=<p>Technological advancements in the silicon industry, as predicted by Moore\&rsquo;s law, have resulted in an increasing number of processor cores on a single chip, giving rise to multicore, and subsequently many-core architectures. This work focuses on identifying key architecture and software optimizations to attain high performance from tiled many-core architectures (TMAs)\&mdash;an architectural innovation in the multicore technology. Although embedded systems design is traditionally powercentric, there has been a recent shift toward high-performance embedded computing due to the proliferation of compute-intensive embedded applications. The TMAs are suitable for these embedded applications due to low-power design features in many of these TMAs. We discuss the performance optimizations on a single tile (processor core) as well as parallel performance optimizations, such as application decomposition, cache locality, tile locality, memory balancing, and horizontal communication for TMAs. We elaborate compiler-based optimizations that are applicable to\&nbsp;TMAs, such as function inlining, loop unrolling, and feedback-based optimizations. We present a case study with optimized dense matrix multiplication algorithms for Tilera\&rsquo;s TILEPro64 to experimentally demonstrate the performance and performance per watt optimizations on TMAs. Our results quantify the effectiveness of algorithmic choices, cache blocking, compiler optimizations, and horizontal communication in attaining high performance and performance per watt on TMAs.</p>
|abstract=<p>Technological advancements in the silicon industry, as predicted by Moore\&rsquo;s law, have resulted in an increasing number of processor cores on a single chip, giving rise to multicore, and subsequently many-core architectures. This work focuses on identifying key architecture and software optimizations to attain high performance from tiled many-core architectures (TMAs)\&mdash;an architectural innovation in the multicore technology. Although embedded systems design is traditionally powercentric, there has been a recent shift toward high-performance embedded computing due to the proliferation of compute-intensive embedded applications. The TMAs are suitable for these embedded applications due to low-power design features in many of these TMAs. We discuss the performance optimizations on a single tile (processor core) as well as parallel performance optimizations, such as application decomposition, cache locality, tile locality, memory balancing, and horizontal communication for TMAs. We elaborate compiler-based optimizations that are applicable to\&nbsp;TMAs, such as function inlining, loop unrolling, and feedback-based optimizations. We present a case study with optimized dense matrix multiplication algorithms for Tilera\&rsquo;s TILEPro64 to experimentally demonstrate the performance and performance per watt optimizations on TMAs. Our results quantify the effectiveness of algorithmic choices, cache blocking, compiler optimizations, and horizontal communication in attaining high performance and performance per watt on TMAs.</p>
|month=4
|year=2013
|volume=64
|volume=64
|journal=The Journal of Supercomputing
|journal=The Journal of Supercomputing
|title=High-Performance Optimizations on Tiled Many-Core Embedded Systems: A Matrix Multiplication Case Study
|title=High-Performance Optimizations on Tiled Many-Core Embedded Systems: A Matrix Multiplication Case Study
|entry=article
|entry=article
|date=2013-4/-01
|pdf=Munir2013.pdf
}}
}}

Latest revision as of 17:38, 9 November 2021

Munir2013
entryarticle
address
annote
authorArslan Munir and Farinaz Koushanfar and Ann Gordon-Ross and Sanjay Ranka
booktitle
chapter
edition
editor
howpublished
institution
journalThe Journal of Supercomputing
month4
note
number
organization
pages
publisher
school
series
titleHigh-Performance Optimizations on Tiled Many-Core Embedded Systems: A Matrix Multiplication Case Study
type
volume64
year2013
doi10.1007/s11227-013-0916-9
issn
isbn
urlhttp://link.springer.com/article/10.1007\%2Fs11227-013-0916-9
pdfMunir2013.pdf

File:Munir2013.pdf

Icon-email.png
Email:
farinaz@ucsd.edu
Icon-addr.png
Address:
Electrical & Computer Engineering
University of California, San Diego
9500 Gilman Drive, MC 0407
Jacobs Hall, Room 6401
La Jolla, CA 92093-0407
Icon-addr.png
Lab Location: EBU1-2514
University of California San Diego
9500 Gilman Dr, La Jolla, CA 92093