2008 2007 Hardware dependant Software design Ahmed A. Jerraya CEA-LETI Ahmed.jerraya@cea.fr 1
Outline 2007 Multiprocessor System on Chip: HW-SW Architectures HW/SW interfaces abstraction: Programming models Hardware dependent software design Long term trends: HW-SW codesign. 2
I. Multiprocessor System on Chip: HW-SW Architectures SoC Design Cost Traditional Model Trend Développment Product 2/3 Developpment Plateform Techno. 1/3 Cost X5 65 to 32nm [Henoff:HDTV ST] [MEDEA+Forum] SW HW Design CAD/Lab Si Fab +16% /an SW HW design CAD/Lab Si Fab Developpement Product 3/4 Developpement Plateform 1/4 3
I. Multiprocessor System on Chip: HW-SW Architectures Global optimization required Focus of this talk: HDS = Hardware Dependant Software Applications SW SW HW CAD/Lab Techno Plateform Si Fab SW -Méthods -MW -OS SW HDS Architecture HW IC Design Variability DFM CAD/Lab Techno Plateform Si Fab Performances costs Reliability Applications 4
I. Multiprocessor System on Chip: HW-SW Architectures Key Technology: HW/SW Interfaces Memory access protocols DMA, HW I/O Application Processor architecture OS architecture HW-SW Comm. Design [O Nils2001] SW Code / OS micro-processor Interrupt ADR Data IO HW A B C D Hardware 5
I. Multiprocessor System on Chip: HW-SW Architectures HW/SW interfaces design Challenges HW/SW interfaces design requires pluridisciplinary knowledge HW/SW interfaces are crucial for both cost and performances Cost issues: Hardware dependent Software is tedious to develop, difficult to debug and validate Non optimized software leads to larger footprint and consequently to larger embedded memory Performances : Non optimized HW/SW interface introduces non acceptable global latencies Processor + cache + embedded memory are responsible of the major power consumption part in a SoC A major challenge in MPSoC design is to efficiently design the HW/SW interfaces 6
II. HW/SW interfaces abstraction: Programming models Outline Multiprocessor System on Chip: HW-SW Architectures HW/SW interfaces abstraction: Programming models Hardware dependent software design Long term trends: HW-SW codesign. 7
II. HW/SW interfaces abstraction: Programming models Application SW Designer: A set of system calls used to hide the underlying execution platform. Also Called Programming Model HW designer: A set of registers, control signals and more sophisticated adaptors to link CPU to HW subsystems. System SW designer: Low level SW implementation of the programming Model for a given HW architecture. CPU is the ultimate HW-SW Interface Sequential scheme assuming HW is ready to start low level SW design SoC Designer Abstracts both HW and SW in addition to CPU HW-SW interfaces tradeoff Defining HW-SW Interfaces Sequential SW program Call HW (x, y, z) API HW Dependant SW CPU (local Architecture) HW-SW CPU Bus @ Interfaces CTRL HW-Adaptation Start done x HW function wait start data y z x y z 8
II. HW/SW interfaces abstraction: Programming models Programming Model The Classical Solution to Abstract HW-SW Interfaces Programming language with implicit primitives (e.g. module hierarchy & threads in SystemC) API: Application Programming Interface (MPI, Posix Threads) Simulation model (MPICH, Linux) Used by SW community to free the SW designer from knowing HW details. Facilitate porting of application SW over different architectures that support the same API. SW modules API HW dependent SW (HDS) HW architecture SW modules API SW modules API Implementation... SW 1 SW 2 SW n Execution Environment Simulation 9
II. HW/SW interfaces abstraction: Programming models Programming models, the GAP SoC Design Programming Model Distributed SW Design RTL (Verilog, BinSW) Virtual Prototype (RTL HW, BinSW, e.g. SystemC) Explicit concepts ALL CPU implementation ALL RTL HW CPU organization Mixed Abstract HW-SW Interfaces models MPI NONE RT-CORBA High Level Synchronisation Communication NONE CORBA SDL -Concurrency - Threading NONE 10
II. HW/SW interfaces abstraction: Programming models Programming Models for MPSoC SoC Design Programming Model Distributed SW Design RTL (Verilog, BinSW) Virtual Prototype (RTL HW, BinSW, e.g. SystemC) Transaction Accurate (TLM HW, TLM SW, e.g. SystemC++) Virtual Architecture (Abstract HW, Threads, e.g. SystemC, Simulink) MPI RT-CORBA CORBA SDL Explicit concepts ALL ALL -OS - Specific I/O Communication/ Computation Modules Synchronisation Communication -Concurrency - Threading CPU implementation RTL HW CPU organization CPU Subsystem - Explicit HW modules Abstract Interconnect NONE NONE NONE New HW-SW Abstraction levels 11
II. HW/SW interfaces abstraction: Programming models Virtual Prototype Model - Explicit SW (Binary) - Explicit HW - Implicit CPU Implementation (ISS) - Cycle accurate Model - Typical Model: Cosimulation, SystemC Binary SW execution on CPU Software Thread 1 Software Thread 2 HW Node SW Node Interconnect SW Node Mem I/O DMA HW Accel. CPU/ ISS Network interface HDS API Virtual Architecture OS Specific I/O HAL API HAL 12
II. HW/SW interfaces abstraction: Programming models - Explicit OS Transaction Accurate model - Explicit CPU Subsystem Architecture - Implicit CPU/HAL (TA_BFM) - Transaction cycle accurate HW - Typical Model : Native SW Execution [ST/TIMA] Software Thread 1 Software Thread 2 HW Node SW Node Interconnect SW Node Mem I/O DMA HW Accel. TA_BFM to abstract CPU & Hal Network interface HDS API Virtual Architecture OS Specific I/O HAL API Native SW execution on Host 13
II. HW/SW interfaces abstraction: Programming models SW code at Transaction Accurate level HdS responsible for task and HW resources management Task code integration with OS and HdS API implementation Communication (Explicit communication protocol ) Example of TA SW code main.c extern void task_t2 (void); void start (void){ create_task(task_t2); } task_t2.c channel ch_fifo,ch_reg; void task_t2(void ) { recv_data (ch_fifo,b, 10); recv_data (ch_reg,c,20); } Communication SW void recv_data (ch,dst,size) { switch (ch.protocol){ case FIFO: If (ch.state==empty) schedule(); case REG: dst = _io_read_reg (ch.addr,size); OS void schedule(void) { int old_tid=cur_tid; cur_tid=new_tid; cxt_switch(old_tid.cxt,cur_tid.cxt); } Abstract CPU1 Interface CPU-SS1 Memory Periph HW SS Interconnect T2 T3 HDS API Comm OS HAL API Abstract CPU2 Interface Memory Periph 14
II. HW/SW interfaces abstraction: Programming models Virtual Architecture model - Explicit SW communication API - Explicit System Interconnect - Implicit HDS (OS, Communication Implementation) - Typical Models: TTL, DSoC, Pthreads HW Node SW Node SW Node Explicit Interconnect VA_BFM to Abstract HDS & CPU Sub- System Model Software Thread 1 HDS API Software Thread 2 Native SW execution on Host 15
II. HW/SW interfaces abstraction: Programming models SW code at Virtual Architecture level Memory declaration Computation code Communication code using HdS API Specific mapping to platform channels T1 HDS API Abstract CPU SS1 HW SS Interconnect T2 HDS API T3 HDS API Abstract CPU SS2 Task code of T1 static int B[10], C[20], D[10]; void task_t1( ) { while(1) { recv_data (reg_ch, B, 10); F1(B,C); F2(C,D); send_data (gmem_ch, D, 10); } } Memory Communication API Computation code Communication API Communication primitive void recv_data (REG_Ch* ch, void* dst, int size) { dst = ch->read (ch->address,size); } class REG_Ch : public sc_prim_channel { word *buffer; public: word * read (unsigned int addr,int size) { for (i=0; i<size; i++) *(ret+i)=*(buffer + addr +i); return ret; } }; 16
II. HW/SW interfaces abstraction: Programming models System Architecture model - Explicit thread/subsystems & HW-SW partition - Implicit communication - Typical Models: MPI, Simulink, KPN HW Node SW Node SW Node Software Thread 1 Software Thread 2 Abstract Interconnect Implicit SW execution as a simulation module 17
II. HW/SW interfaces abstraction: Programming models System Architecture Modeling in Simulink Capture application and mapping to architecture Task (set of functions: Simulink blocks, S- functions) Subsystem (set of tasks) Abstract communication types (Intra-SS, Inter-SS) Communication protocol (Generic Simulink I/Os, Explicit annotation for implementation) Functions Subsystems SW-SS 1 T1 REG Comm. Inter-SS Tasks F 1 F 2 F 3 F 4 GMEM T2 FIFO SW-SS 2 T3 Comm. Intra-SS ParamName1 = Value1 ParamName2 = Value2 18
III. Hardware dependent software design Outline Multiprocessor System on Chip: HW-SW Architectures HW/SW interfaces abstraction: Programming models Hardware dependent software design Long term trends: HW-SW codesign. 19
III. Hardware dependent software design Heterogeneous MPSoC Diopsis Tile [ESWEEK06] ARM9 SS DSP SS MEM SS POT SS (Periph. On Tile): I/O peripherals System peripherals MEM SS DXM Bridge Bridge Timer ARM9 SS SRAM Bridge ARM9 ROM AMBA AHB POT SS Bridge DMA DSP SS mailbox PIC mailbox PMEM Interconnect: AMBA bus AIC SPI REG DMEM DSP Local & global memories accessible by both processing units Different communication schemes between CPUs 20
MEM SS III. Hardware dependent software design ARM9 SS Multiple SW Stacks DXM Bridge SRAM Bridge ARM9 ROM HDS A (OS A, COM A) AMBA AHB Bridge Bridge DMA Timer AIC mailbox SPI PIC REG mailbox DMEM PMEM DSP HDS B (OS B, COM B) POT SS Reduce the design cost Small memory footprint Increase performances DSP SS Specific platform devices Efficient communication schemes ARM specific OS and COM DXM & DSP REG access DSP specific OS and COM PIC, Mailbox, DMA REG & DXM access 21
III. Hardware dependent software design SW Stack organization Layered SW Stack Application code SW code of tasks mapped on the CPU INIT HdS (Hardware dependent SW) OS + Communication + HAL (Hardware Abstraction Layer) Different SW components need to be validated incrementally Layers corresponds to Different SW abstraction levels SW development platforms at each abstraction level Tasks init. T1 Tn HDS API Comm. library HAL OS library HAL API Software Stack 22
III. Hardware dependent software design Classical SW design flow to interface HW Programming Model: Abstract HW at Different level Discontinuities: Compilation: Generally ignore the CPU environment (Interrupts, Complex I/O) Sys.lib: adapt for different HW MMAP: Adapt to different CPU-memory architecture User.lib: to make the flow efficient for the application Application Architecture Sys.lib Program + API Calls Compiler Code+Calls Linker Executable Code Programming Model (API) MMAP User.lib 23
III. Hardware dependent software design Code generation from a high level Model d High level description Contains the application behavior, partitioning and resources mapping Difficult to capture architecture specificities SW architecture exploration Application SW generation Multitask and multiprocessor Various operating systems and specific communications support Speed and/or size code optimizations Software stack composition Using existing or generated tasks Various operating systems and specific communications support Application SW generation Task codes High level description Final binary production Final binary SW stack composition HDS 24
III. Hardware dependent software design Software generation environment High level application model (function + mapping) T1 T2 T3 T4 Simulation, performances analysis Software architecture parameters Software Stack I T1 T4 HDS API COM OS HAL API HAL Tasks code Application software generation Tasks init. COM library Software stack composition OS library HAL library 25
IV. Long term trends: HW-SW codesign Outline Multiprocessor System on Chip: HW-SW Architectures HW/SW interfaces abstraction: Programming models Hardware dependent software design Long term trends: HW-SW codesign. 26
IV. Long term trends: HW-SW codesign Challenges Affordable programming model for MPSoC end users Easy to port new application by end user Based on high level programming model Seamless HW-SW Integration Architecture exploration and Application SW mapping Quality Proven SW Modules Ensure better design success using proven IP 27
IV. Long term trends: HW-SW codesign Ideal Design Flow ARM7 Mem Local Bus interface Periph.... XTENSA Mem Local Bus Architecture interface Periph. Application Hardware Architecture Template Global Interconnect HW Lib 1 Mapping Architecture/Alogoritm Mapping (Simulink) SW Lib Simulink Model HW Gen. SW Gen. Virtual Prototype Interconnect VP Model 28
IV. Long term trends: HW-SW codesign Classical HW/SW Design : The GAPS Software Sub-System Software Thread 1 Software Thread 2 Fully implicit HW/SW Interface Hardware Software Sub-System Software Thread 1 Software Thread 1 Abstract GAP HW/SW Interface Hardware Binary SW Appli OS HAL ISS IT Ctrl Fully explicit HW/SW Interface MEM FIFO HW System Level Functional specification Partition ning Software design Hardware/Software Early HW/SW integration discontinuity Integration Virtual Prototype ISA/RTL Hardware design [Gerin ASPDAC 07] Correction cycle 29
IV. Long term trends: HW-SW codesign HW and SW Mixed Models Seamless refinement at four abstraction levels: Simulink Combined Algorithm and Architecture Model (Simulink CAAM) HW/SW Interfaces refinements Software Design Automatic Code Generation CPU SS2 Thread Thread Main 1 Main 2 Hardware Design App Hardware App HdS Target CPU HW-SW Codesign OS and HW codesign Design Platform based Generators port port port port Platform based Generators Abstract Channels Interconnect Interconnect System level model (Simulink CAAM) Virtual architecture Model (SystemC) Transaction accurate (SystemC) Virtual prototype (SystemC) 30
IV. Long term trends: HW-SW codesign Simulink CAAM of M-JPEG Decoder Three CPUs (ARM, Xtensa) Communication Units (GFIFO, HWFIFO, SWFIFO) [RSP2007 DAC07] 7 S-Functions 7 Delays 26 Links 4 IASs Architecture Layer Subsystem Layer 31
IV. Long term trends: HW-SW codesign Experiment Results of MJPEG 10-frame QVGA (320x240) JPEG stream RTW: sequential program on the host machine Simulink: Simulation in Simulink GUI VA: System C model without HW information TA: Abstract CPU + Other devices (SC TLM) VP: Cycle Accurate ISS + Other devices (SC TLM) CPU1 CPU2 CPU3 3ARM ARM7 ARM7 ARM7 2XT1AR M 1ARM2X T Xtensa Xtensa ARM ARM Xtensa Xtensa 3XT Xtensa Xtensa Xtensa The architecture models at different abstraction levels provide trade-off alternatives between simulation time and architecture detail. Three ARM7 Subsystems 32
IV. Long term trends: HW-SW codesign Performance Analysis of MJPEG percentage of total cycles 70 60 50 40 30 20 10 0 90 80 70 60 50 40 30 20 10 0 59 14 27 3ARM 2XT1ARM 80 70 51 64 75 60 56 40 38 35 50 40 37 23 30 23 14 20 18 18 these performance analysis 10 results, 7, designers 2 task allocation to the 0 CPU1 processors CPU2so CPU3 CPU1 CPU2 CPU3 77 Using these can optimize task allocation that each processor has enough margins for future 1ARM2XT 3XT additional functions. 63 65 27 19 17 18 16 4 CPU1 CPU2 CPU3 57 percentage of total cycles 70 60 50 40 30 20 10 0 13 28 30 24 42 45 26 29 CPU1 CPU2 CPU3 33
IV. Long term trends: HW-SW codesign HW-SW interfaces Conclusions Heterogeneous MPSoC require multiple SW stacks Application-specific hardware & software Programming models for SoC Abstract specific architectures for a better match between HW & SW Different abstraction levels for early application SW & HDS validation HDS design: Platform based SW development at different abstraction levels Future trends: Architecture Exploration HDS-CPU codesign HW-SW interfaces architecture exploration 34
Acknowledgement IV. Long term trends: HW-SW codesign Prof F. Pétrot, Prof. F. Rousseau from TIMA PhD Students K. Popovici, P. Gerin from TIMA JR Lequepeys, CEA LETI 35
Innovation for industry 2007 36
2007 You will be welcome to the 10 th Leti Annual Review 24-26 June 2008 at Minatec For more information : http://www.leti.fr 37