Embedded Systems - 1st Edition - ISBN: 9780128003428, 9780128004128

Embedded Systems

1st Edition

ARM Programming and Optimization

5.0 star rating 1 Review
Authors: Jason Bakos
Paperback ISBN: 9780128003428
eBook ISBN: 9780128004128
Imprint: Morgan Kaufmann
Published Date: 24th September 2015
Page Count: 320
Sales tax will be calculated at check-out Price includes VAT/GST
Price includes VAT/GST

Institutional Subscription

Secure Checkout

Personal information is secured with SSL technology.

Free Shipping

Free global shipping
No minimum order.


Embedded Systems: ARM Programming and Optimization combines an exploration of the ARM architecture with an examination of the facilities offered by the Linux operating system to explain how various features of program design can influence processor performance. It demonstrates methods by which a programmer can optimize program code in a way that does not impact its behavior but improves its performance. Several applications, including image transformations, fractal generation, image convolution, and computer vision tasks, are used to describe and demonstrate these methods. From this, the reader will gain insight into computer architecture and application design, as well as gain practical knowledge in the area of embedded software design for modern embedded systems.

Key Features

  • Covers three ARM instruction set architectures, the ARMv6 and ARMv7-A, as well as three ARM cores, the ARM11 on the Raspberry Pi, Cortex-A9 on the Xilinx Zynq 7020, and Cortex-A15 on the NVIDIA Tegra K1
  • Describes how to fully leverage the facilities offered by the Linux operating system, including the Linux GCC compiler toolchain and debug tools, performance monitoring support, OpenMP multicore runtime environment, video frame buffer, and video capture capabilities
  • Designed to accompany and work with most of the low cost Linux/ARM embedded development boards currently available


Professional programmers needing to understand embedded development, students in a course using ARM as the processor

Table of Contents

  • Dedication
  • Preface
    • Using this Book
    • Instructor Support
  • Acknowledgments
  • Chapter 1: The Linux/ARM embedded platform
    • Abstract
    • 1.1 Performance-Oriented Programming
    • 1.2 ARM Technology
    • 1.3 Brief History of ARM
    • 1.4 ARM Programming
    • 1.5 ARM Architecture Set Architecture
    • 1.6 Assembly Optimization #1: Sorting
    • 1.7 Assembly Optimization #2: Bit Manipulation
    • 1.8 Code Optimization Objectives
    • 1.9 Runtime Profiling with Performance Counters
    • 1.10 Measuring Memory Bandwidth
    • 1.11 Performance Results
    • 1.12 Performance Bounds
    • 1.13 Basic ARM Instruction Set
    • 1.14 Chapter Wrap-Up
    • Exercises
  • Chapter 2: Multicore and data-level optimization: OpenMP and SIMD
    • Abstract
    • 2.1 Optimization Techniques Covered by this Book
    • 2.2 Amdahl's Law
    • 2.3 Test Kernel: Polynomial Evaluation
    • 2.4 Using Multiple Cores: OpenMP
    • 2.5 Performance Bounds
    • 2.6 Performance Analysis
    • 2.7 Inline Assembly Language in GCC
    • 2.8 Optimization #1: Reducing Instructions per Flop
    • 2.9 Optimization #2: Reducing CPI
    • 2.10 Optimization #3: Multiple Flops per Instruction with Single Instruction, Multiple Data
    • 2.11 Chapter Wrap-Up
  • Chapter 3: Arithmetic optimization and the Linux Framebuffer
    • Abstract
    • 3.1 The Linux Framebuffer
    • 3.2 Affine Image Transformations
    • 3.3 Bilinear Interpolation
    • 3.4 Floating-Point Image Transformation
    • 3.5 Analysis of Floating-Point Performance
    • 3.6 Fixed-Point Arithmetic
    • 3.7 Fixed-Point Performance
    • 3.8 Real-Time Fractal Generation
    • 3.9 Chapter Wrap-Up
  • Chapter 4: Memory optimization and video processing
    • Abstract
    • 4.1 Stencil Loops
    • 4.2 Example Stencil: The Mean Filter
    • 4.3 Separable Filters
    • 4.4 Memory Access Behavior of 2D Filters
    • 4.5 Loop Tiling
    • 4.6 Tiling and the Stencil Halo Region
    • 4.7 Example 2D Filter Implementation
    • 4.8 Capturing and Converting Video Frames
    • 4.9 Video4Linux Driver and API
    • 4.10 Applying the 2D Tiled Filter
    • 4.11 Applying the Separated 2D Tiled Filter
    • 4.12 Top-Level Loop
    • 4.13 Performance Results
    • 4.14 Chapter Wrap-Up
  • Chapter 5: Embedded heterogeneous programming with OpenCL
    • Abstract
    • 5.1 GPU Microarchitecture
    • 5.2 OpenCL
    • 5.3 OpenCL Programming Model, Idioms, and Abstractions
    • 5.4 Kernel Workload Distribution
    • 5.5 OpenCL Implementation of Horner's Method: Device Code
    • 5.6 Performance Results
    • 5.7 Chapter Wrap-Up
  • Appendix A: Adding PMU support to Raspbian for the Generation 1 Raspberry Pi
    • A.1 Download the Linux Kernel and Cross-Compiler Tools
    • A.2 Kernel Modifications
    • A.3 Building the Kernel
    • A.4 Installing the Kernel
  • Appendix B: NEON intrinsic reference
    • B.1 Vector Data Types
    • B.2 Reading and Writing Vector Variables
    • B.3 Vector Element Manipulation
    • B.4 Optimizing Floating-Point Code with NEON Intrinsics
    • B.5 Summary of NEON Instrinsics
  • Appendix C: OpenCL reference
    • C.1 Platform Layer
    • C.2 Memory Types
    • C.3 Buffer Management
    • C.4 Programs and Compiling
    • C.5 Kernel Functions
    • C.6 Command Queue Functions
    • C.7 Vector and Image Data Types
    • C.8 Attributes
    • C.9 Constants
    • C.10 Built-in Functions
  • Index


No. of pages:
© Morgan Kaufmann 2016
24th September 2015
Morgan Kaufmann
Paperback ISBN:
eBook ISBN:

About the Author

Jason Bakos

Jason D. Bakos is an associate professor of Computer Science and Engineering at the University of South Carolina. He received a BS in Computer Science from Youngstown State University in 1999 and a PhD in Computer Science from the University of Pittsburgh in 2005. Dr. Bakos’s research focuses on mapping data- and compute-intensive codes to high-performance, heterogeneous, reconfigurable, and embedded computer systems. His group works closely with FPGA-based computer manufacturers Convey Computer Corporation, GiDEL, and Annapolis Micro Systems, as well as GPU and DSP manufacturers NVIDIA, Texas Instruments, and Advantech. Dr. Bakos holds two patents, has published over 30 refereed publications in computer architecture and high performance computing, was a winner of the ACM/DAC student design contest in 2002 and 2004, and received the US National Science Foundation (NSF) CAREER award in 2009. He is currently serving as associate editor for ACM Transactions on Reconfigurable Technology and Systems.

Affiliations and Expertise

Computer Science and Engineering, University of South Carolina, Associate Editor, ACM Transactions on Reconfigurable Technology and Systems

Ratings and Reviews