Introduction au cours Pipeline logiciel Fusion de boucles. Sans contraintes de ressources. Optimisations des durées de vie



Documents pareils




P etit pat hw o rk de ombinatoire énumérative Mireille Bousquet-Mélou, CNRS, LaBRI, Bo rdeaux ri.fr/ b ousquet





Vérification d invariants de systèmes paramétrés par superposition





Instructions Mozilla Thunderbird Page 1



Stratégie DataCenters Société Générale Enjeux, objectifs et rôle d un partenaire comme Data4

STATUTS DE L ASSOCIATION. Association régie par par la Loi du 1 er juillet 1901

physicien diplômé EPFZ originaire de France présentée acceptée sur proposition Thèse no. 7178

Editing and managing Systems engineering processes at Snecma

Exemple PLS avec SAS

APPENDIX 6 BONUS RING FORMAT

Face Recognition Performance: Man vs. Machine

Instructions pour mettre à jour un HFFv2 v1.x.yy v2.0.00

Utiliser une WebCam. Micro-ordinateurs, informations, idées, trucs et astuces

THÈSE. présentée à TÉLÉCOM PARISTECH. pour obtenir le grade de. DOCTEUR de TÉLÉCOM PARISTECH. Mention Informatique et Réseaux. par.

Application Form/ Formulaire de demande

Gestion des prestations Volontaire

PARIS ROISSY CHARLES DE GAULLE

Practice Direction. Class Proceedings

Once the installation is complete, you can delete the temporary Zip files..

M2-Images. Rendu Temps Réel - OpenGL 4 et compute shaders. J.C. Iehl. December 18, 2013

1.The pronouns me, te, nous, and vous are object pronouns.

How to Login to Career Page

I. COORDONNÉES PERSONNELLES / PERSONAL DATA

WEB page builder and server for SCADA applications usable from a WEB navigator

ETABLISSEMENT D ENSEIGNEMENT OU ORGANISME DE FORMATION / UNIVERSITY OR COLLEGE:

INSTITUT MARITIME DE PREVENTION. For improvement in health and security at work. Created in 1992 Under the aegis of State and the ENIM

MELTING POTES, LA SECTION INTERNATIONALE DU BELLASSO (Association étudiante de lʼensaparis-belleville) PRESENTE :

Exercices sur SQL server 2000

IDENTITÉ DE L ÉTUDIANT / APPLICANT INFORMATION

Garage Door Monitor Model 829LM

Surveillance de Scripts LUA et de réception d EVENT. avec LoriotPro Extended & Broadcast Edition

CALCUL DE LA CONTRIBUTION - FONDS VERT Budget 2008/2009

Improving the breakdown of the Central Credit Register data by category of enterprises

Judge Group: P Title: Quel est meilleur: le compost ou le fertilisant chimique? Student(s): Emma O'Shea Grade: 6

Commande Prédictive. J. P. Corriou. LSGC-ENSIC-CNRS, Nancy. corriou@ensic.inpl-nancy.fr

lundi 3 août 2009 Choose your language What is Document Connection for Mac? Communautés Numériques L informatique à la portée du Grand Public

Compléter le formulaire «Demande de participation» et l envoyer aux bureaux de SGC* à l adresse suivante :

Technologies quantiques & information quantique

Lesson Plan Physical Descriptions. belle vieille grande petite grosse laide mignonne jolie. beau vieux grand petit gros laid mignon

Paxton. ins Net2 desktop reader USB

Cette Leçon va remplir ces attentes spécifiques du curriculum :

Règlement sur le télémarketing et les centres d'appel. Call Centres Telemarketing Sales Regulation

The new consumables catalogue from Medisoft is now updated. Please discover this full overview of all our consumables available to you.

Contrôle d'accès Access control. Notice technique / Technical Manual

that the child(ren) was/were in need of protection under Part III of the Child and Family Services Act, and the court made an order on

de stabilisation financière

Le passé composé. C'est le passé! Tout ça c'est du passé! That's the past! All that's in the past!

GAME CONTENTS CONTENU DU JEU OBJECT OF THE GAME BUT DU JEU

3615 SELFIE. HOW-TO / GUIDE D'UTILISATION

Lycée Français de Hong Kong French International School

L ABC de l acquisition de petites entreprises

Credit Note and Debit Note Information (GST/ HST) Regulations

ONTARIO Court File Number. Form 17E: Trial Management Conference Brief. Date of trial management conference. Name of party filing this brief

RICHEL SERRES DE FRANCE PAR_ _02432_ALT DATE: 03/02/2012

Master Développement Durable et Organisations Master s degree in Sustainable Development and Organizations Dossier de candidature Application Form

Notice Technique / Technical Manual

Post-processing of multimodel hydrological forecasts for the Baskatong catchment

First Nations Assessment Inspection Regulations. Règlement sur l inspection aux fins d évaluation foncière des premières nations CONSOLIDATION

Guide d'installation rapide TFM-560X YO.13

FCM 2015 ANNUAL CONFERENCE AND TRADE SHOW Terms and Conditions for Delegates and Companions Shaw Convention Centre, Edmonton, AB June 5 8, 2015

THE SUBJUNCTIVE MOOD. Twenty-nineth lesson Vingt-neuvième leçon

If you understand the roles nouns (and their accompanying baggage) play in a sentence...

Principe de TrueCrypt. Créer un volume pour TrueCrypt

DOCUMENTATION - FRANCAIS... 2

calls.paris-neuroscience.fr Tutoriel pour Candidatures en ligne *** Online Applications Tutorial

POLICY: FREE MILK PROGRAM CODE: CS-4

Quick Start Guide This guide is intended to get you started with Rational ClearCase or Rational ClearCase MultiSite.

Comprendre l impact de l utilisation des réseaux sociaux en entreprise SYNTHESE DES RESULTATS : EUROPE ET FRANCE

Le No.1 de l économie d énergie pour patinoires.

2 20 e Journées Bases de Données Avancées (BDA 2004). 1. Introduction

CONVENTION DE STAGE TYPE STANDART TRAINING CONTRACT

Contents Windows

Data issues in species monitoring: where are the traps?

AIDE FINANCIÈRE POUR ATHLÈTES FINANCIAL ASSISTANCE FOR ATHLETES

FÉDÉRATION INTERNATIONALE DE NATATION Diving

TABLE DES MATIERES A OBJET PROCEDURE DE CONNEXION

AUTUMN/WINTER PARIS COLLECTION

affichage en français Nom de l'employeur *: Lions Village of Greater Edmonton Society

RULE 5 - SERVICE OF DOCUMENTS RÈGLE 5 SIGNIFICATION DE DOCUMENTS. Rule 5 / Règle 5

APPENDIX 2. Provisions to be included in the contract between the Provider and the. Holder

04002-LOR 2004 Mars 2004

WiFi Security Camera Quick Start Guide. Guide de départ rapide Caméra de surveillance Wi-Fi (P5)

Package Contents. System Requirements. Before You Begin

Transcription:

Outline Introduction au cours 1 Introduction au cours Compilation et optimisations de codes Des p'tites boucles, toujours des p'tites boucles Exemples de spécicités architecturales 2 3 Intérêts et problèmes simple : variantes Fusion avec décalage

Ó Ü Ù Ð Å Ò Ñ ÖÐ Ñ ÒÖ Ø ÕÙ ÔÓÙÖÐ ÓÑÔ Ø ÓÒº Á ÓÙ Ð ÕÙ Ú Ð ÒØÑ Ò Ñ ÖÐ ÙÖ Ñ Ü Ñ Ð ³ÙÒ Ñ ÒÒ³ Ý ÒØÕÙ O(V 2 E) ÓÙ ÓÒØÖ ÒØ Ô Ö Ó ³ ÓÖÐÓ º Ë Ü ÓÑÔÐ Ü Ø O(VElog(V))º г Ð ÓÖ Ø Ñ ÓÙØ¹Ó ¹ ÐØ Ö ÓÑÔÐ Ü Ø O(E Ö ÔÓ ÒÙÐ Ô Ö Ó ³ ÓÖÐÓ µº Ð ÓÖ Ø Ñ Ä Ö ÓÒ Ø 2 Å Ò Ñ ÖÐ ÒÓÑ Ö ÓÒØÖ ÒØ ÔÓÙÖÐ ÓÑÔ Ø ÓÒº ÕÙ Ú Ð ÒØÑ Ñ Ò ÖÐ ÒÓÑ Ö ³ Ö ÔÓ ÒÙÐºÎ Ö ÒØ ) Ú Ö ÓÒ µóù

Ê Ø Ñ Ò Ø ÖÙ Ø (u,v) Ê Ø Ñ Ò ÓÒØ ÓÒr : V ZØ ÐÐ ÕÙ ÔÓÙÖØÓÙØ Öe = w(e) ÒÓÑ Ö Ö ØÖ ÙÖг Öeºd(u) ÙÖ Ð³ÓÔ Ö Ø ÙÖuº ÓÖÖ ÔÓÒ Ò Ð Ö Ø Ñ Ò º Ö Ô Ö٠غ È Ö Ó ³ ÓÖÐÓ Φ(G) ÙÖ Ñ Ü Ñ Ð ³ÙÒ Ñ Ò Ò Ö ØÖ º w r (e) = w(e)+r(v) r(u) +1 ÒÓÑ Ö ÔÓ Ø Ö ØÖ µº ÙØÌÖÓÙÚ ÖrØ ÐÕÙ Φ(G r ) Ó ØÑ Ò Ñ Ðº

ij Ð ÓÖ Ø Ñ Ä Ö ÓÒ ØË Ü ½»¾µ W(u,v) = min{w(p) u P v}º ÇÒ ÐÙÐ D(u,v) = max{d(p) u P v Øw(P) W(u,v)}º = ÐÓÝ ¹Ï Ö ÐÐ ÐÐ¹Ô Ö ÓÖØ Ø¹Ô Ø µ O(V 3 )º ÈÖÓÔÖ Ø ½Φ(G) = D(u,v)ÔÓÙÖ ÖØ Ò u Øvº ÈÖÓÔÖ Ø ¾Φ(G) c (D(u,v) > c W(u,v) 1)º ÈÖÓÔÖ Ø W r (u,v) = W(u,v)+r(v) r(u) ØD r (u,v) D(u,v)º =

ij Ð ÓÖ Ø Ñ Ä Ö ÓÒ ØË Ü ¾»¾µ ÁÐ Ü Ø ÙÒÖ Ø Ñ Ò rø ÐÕÙ φ(g r ÐÓÖ ÕÙ D(u,v) > ÓÒÐÙ ÓÒ cº ) c r(v) r(u)+w(u,v) 1 ÆÓÙÚ ÙÜ Ö Ø ³ ÓÖÐÓ µ uú Ö v ÔÓ W(u,v) 1 ÐÓÖ ÕÙ D(u,v) > Ð ÓÖ Ø Ñ c ÒÔÐÙ Ö Ò Ø Ùܵº Ò Ø º ÐÐÑ Ò¹ ÓÖ O(VE) = Ü Ø Ò ³ÙÒÖ Ø Ñ Ò Ô ÖÙ Ø ÔÓ ØÖ Ø Ñ ÒØ O(V 3 )º ÓÑÔÐ Ü Ø ØÓØ Ð O(V 3 ÑÔÐÓ ÙÜÔÖ Ñ Ö Ø Ô Ô ÖÖ Ö ÓØÓÑ ÕÙ ÙÖ Ð ÕÙ ÒØ Ø D(u,v)ÔÖ ¹ ÐÙÐ º log(v))º Ñ Ð ÓÖ Ø ÓÒÔÓ Ð ÒO(VElog(V))º

¾ºÊ Ô Ø ÖV 1 Ó ½ºÈÓÙÖØÓÙØ ÓÑÑ Øv V r(v) = Î Ö ÒØ ÓÔØ Ñ º µ ÐÙÐ Ö (v)ð ÙÖ Ñ Ü Ñ Ð ³ÙÒ Ñ Ò Ò Ö ØÖ Ò G r ³ ÜØÖ Ñ Ø vº µèóùöøóùøvø ÐÕÙ (v) > c r(v) = r(v)+1º ºË Φ(G r ) > c ÑÔÓ Ð ³ ØØ Ò Ö c ÒÓÒr ØÐ Ö Ø Ñ Ò Ö º

Leiserson-Saxe circuit retiming (1991) Delay operators : move registers from out-going to in-going edges. Retiming changes number of registers & clock period of the circuit. Initial circuit : φ(g) = 24 7 7 7 3 3 3 3

Leiserson-Saxe circuit retiming (1991) Delay operators : move registers from out-going to in-going edges. Retiming changes number of registers & clock period of the circuit. First retiming to apply targeting φ(g) = 13 +1 7 7 7 +1 +1 3 3 3 3

Leiserson-Saxe circuit retiming (1991) Delay operators : move registers from out-going to in-going edges. Retiming changes number of registers & clock period of the circuit. First retiming applied : φ(g) = 17 +1 7 7 7 +1 +1 3 3 3 3

Leiserson-Saxe circuit retiming (1991) Delay operators : move registers from out-going to in-going edges. Retiming changes number of registers & clock period of the circuit. Second retiming to apply targeting φ(g) = 13 +1 7 7 7 +1 3 +1 3 3 3

Leiserson-Saxe circuit retiming (1991) Delay operators : move registers from out-going to in-going edges. Retiming changes number of registers & clock period of the circuit. Second retiming applied : φ(g) = 17 +1 7 7 7 +1 3 +1 3 3 3

Leiserson-Saxe circuit retiming (1991) Delay operators : move registers from out-going to in-going edges. Retiming changes number of registers & clock period of the circuit. Complete optimal retiming : φ(g) = 13 +2 7 7 7 +2 +1 3 +1 3 3 3 Nice O(VE log(v )) algorithm for clock period minimization. Similar concepts used in software pipelining and loop shifting.

(1 1 p )ΦÓÔØ(G)+λ pº µλ Ê ØÓÙÖ ÙÖÐ Ô Ô Ð Ò ÐÓ Ð (1 1 p )d(p )+λ p ÓÒλ Ð Ð Ä Ö ÓÒ¹Ë Ü ÓÑÔ Ø ÓÒÔ ÖÐ Ø µ Ú s(v) ÙÖ Ñ Üº ³ÙÒ Ñ Ò v Ò Ö ØÖ Ò G r σ(v,k) = s(v)+(r(v)+k)φ(g r )ÓÖ ÓÒÒ Ò Ñ ÒØ ΦÓÔØ(G)º µèóùöóö ÓºÓÔØ Ñ Ðσ (v,k) = s(v)+λ s(v) < λ ØÓÙØ Ñ ÒP u n Ò Ö ØÖ Ò G r Ú Ö (r(v)+k) Ú 1u d(p) s(u n )+d(u n ) s(u 1 ) < λ +max{d(u) u V}º λ ÓÒΦÓÔØ(G) λ +max{d(u) 1 u V}º λ (2 1 p )λ p +max{d(u) 1 u V}

Programmation linéaire min 11t + 1u 2t + 3u 5 3t + 2u 4 5t + u 12 t >=, u >= = max 5x + 4y + 12z 2x + 3y + 5z 11 3x + 2y + z 1 x, y, z Problème 1 du dopé acheter la proportion la moins chère de dopants t et u (de prix respectifs 11 et 1) pour un apport susant en 3 éléments de base (resp. 5, 4 et 12) connaissant l'apport dans chaque dopant. Problème 2 du traquant vendre les trois éléments de base au prix le plus cher, tout en étant moins cher que chacun des dopants.

Programmation linéaire Théorème de dualité min{c.x Ax b, x } = max{y.b ya c, y } Complexité Solution rationnelle : temps polynomial (L. Khachiyan) Solution entière : NP-complet et inégalité seulement. Algorithmes Algorithmes du simplexe, algorithmes de réduction de base en petites dimensions (Lenstra), programmation linéaire paramétrée (second membre).

S(G r S(G ÆÓØ ÙÖÐ Ö ØÖ Ù Ò ÖÙ Ø µ r ) = e E w r(e) = u v e w(e)+r(v) r(u) = S(G)+ v V r(v)( Ò Ö (v) ÓÙØ Ö (v)) ) ÒÓÑ Ö Ö ØÖ ÔÖ Ð Ô Örº min{ r(v)c(v) e = (u,v) E, w(e)+r(v) r(u) } v V ÈÖÓ Ö ÑÑ Ð Ò Ö ÒÒÓÑ Ö ÒØ Ö max{ e E f(e)w(e) v V, v e Ö Ô µ Ø ÓÐÙ Ð ÒØ ÑÔ ÔÓÐÝÒÓÑ Ð f(e) e f(e) = c(v), e E, f(e) } Ù Ð ³ÙÒ Ð ÓÖ Ø Ñ ÓØ Ñ ØÖ ÓÒØÖ ÒØ Ñ ØÖ ³ Ò Ò v

Γ(r) Σ(f)ÕÙ Ð ÕÙ Ó ÒØr Øf (u,v), r(v) r(u)+w(e)º Ê Ø Ñ Ò ÓÒØ ÓÒr : V ZØ ÐÐ ÕÙ e = ÔÔÖÓ ÔÖ Ñ Ð ¹ Ù Ð ÓÒØ ÓÒf : E ZØ ÐÐ ÕÙ v ÐÓØÔÓ Ø ÙÒ ÓÒ ÖÙ Ø µ ÒØÖ ÒØ Ú Ö ÙÒ Ö ÓÖØ ÒØ º V, f(e) =. v e ÇÒ ØÔ ÖÙÒÑ Ñ ÒÓÑ Ö Ö ØÖ ÙÒ Ö v. e f(e)º ÈÖ Ò Ô ÓÒÒ Ö Ó Ø Γ(r) ØΣ(f) ÙÜÖ Ø Ñ Ò Ø ÓØ ÔÓÙÖÕÙ ÁÐ Ü Ø r ØfØ Ð ÕÙ Γ(r) = ÇÒ ÒØÖ Ò ÙÒ ÓÑÑ Ø ÙØ ÒØ Ó ÕÙ³ÓÒ Ò ÓÖØ º Σ(f)º

r Ê Ø Ñ Ò ÇÒÔÓ v r (e) = w ÇÒ Ö Ñ Ò Ñ ÖΓ(r) = ÆÓØ Ñ Ü Ñ Ø ÓÒ ÆÈ¹ÓÑÔÐ Ø Ù Ò ÓÖØµ Ü ÑÔÐ Ñ Ò Ñ Ø ÓÒ ÙÒÓÑ Ö ³ Ö ÔÓ ÒÙÐ (e) > Øv r e E v r(e)º (e) = 1 w r (e) = º ÐÓØÔÓ Ø ÇÒÔÓ z f (e) = f(e) = Øz ÇÒ Ö Ñ Ü Ñ ÖΣ(f) = f (e) = 1 f(e)w r (e) ÒÓÒº e E z f(e)º ÈÖÓÔÖ Ø ½ e Ef(e)w(e) = e E f(e)w r (e) Σ(f)Ò Ô Ò Ô rº ÈÖÓÔÖ Ø ¾v r (e) z f (e) Ò ÓÒ ÓÖÑ Ø ic(e) ÈÖÓÔÖ Ø r Øf Γ(r) Σ(f) ØÓÔØ Ñ Ð e = v r (e) z f (e) E,ic(e) = º º

w (e) r ÓÐÓÖ Ø ÓÒ Ö ÙØ i f,r (e) = ÔÓÙÖØÓÙØ ÖeºÇÒ Incolore Vert Vert i(e)= Ñ Ò Ö i f,r ÓÐÓÖ Ð Ö ÔÓÙÖ Ò ÕÙ Ö ÕÙ ÐÐ (e)ô ÙØ ÖÓ ØÖ º 1 Noir i(e)= Vert Vert Ö ÑÑ ÓÒ ÓÖÑ Ø º i(e)= Noir 1 Noir i(e)= Rouge f(e) Ö ÖÓ ØÖ i f,r ÇÒÔ ÖØ Ù ÓØ Ø Ù Ð ÒÙÐ Ö ÒÓÒÓÒ ÓÖÑ Ò Ø ÙÜ Ö ÔÓ ÒÙÐ ÒÓ Ö µºçò Ó Ø (e) ÙÖØÓÙØ ÖÒÓÒÓÒ ÓÖÑ ÓÒ ÖÚ Öi f,r (e) = ÙÖÐ ÙØÖ º

Ä ÑÑ Å ÒØÝ (V,E)ÙÒ Ö Ô ÓÖ ÒØ ÓÒØÐ Ö ÓÒØÒÓ Ö Ú ÖØ ÖÓÙ ÓÙ µáð Ü Ø ÙÒÝÐ ÓÒØ Ò ÒØe µáð Ü Ø ÙÒÓ¹ÝÐ ÓÒØ Ò ÒØe Ò Ö ÒÓÐÓÖ Ø ÓÒØØÓÙ Ð Ö ÒÓ Ö ÓÒØ Ò Ð Ñ Ñ Ò ØØÓÙ Ð Ö Ú ÖØ Ò Ð Ò ÓÒØÖ Ö º Ò Ö ÖÓÙ ÓÒØØÓÙ Ð Ö ÖÓÙ Ò Ð ÙÜ Ò º ËÓ ØG = ÒÓÐÓÖ ºËÓ Øe ÙÒ ÖÒÓ ÖºÍÒ ¾ÔÖÓÔÓ Ø ÓÒ Ù Ú ÒØ ØÚÖ ÑÓÒ ØÖ Ø ÓÒÓÒ ØÖÙØ Ú Ô Ö ÑÔÐ Ñ ÖÕÙ Ò Ù Ú ÒØÔ ÖØ Ö e Ð Ö ÒÓ Ö Ò Ð Ò e ÒÓ Ö ÓÒØ Ò Ð Ñ Ñ Ò ØØÓÙ Ð Ö Ú ÖØ Ò Ð Ò ÓÒØÖ Ö º Ð Ö Ú ÖØ Ò Ð Ò ÓÒØÖ Ö Ð Ö

ÕÙ Ô ÖÑ Ø Ö Ò Ö e ÓÒ ÓÖÑ Ò Ö Ö ³ ÖÒÓÒÓÒ ÓÖÑ º Ò Ô ÖÐ Ó¹ÝÐ ¾ºÊ ÓÐÓÖ ÖÐ Ö Ò ÓÒØ ÓÒ Ù Ö ÑÑ ÓÒ ÓÖÑ Ø Ø º ÓÑÔÐ Ü Ø O( E ( E + V ))ºÈÓ Ð Ø Ö ÓÙØ Ö Ö ³ ÓÖÐÓ º ÒÓÙÚ ÙÜr Øf ÓÐÓÖ ÖÐ Ö Ò ÓÒØ ÓÒ Ù Ö ÑÑ ÓÒ ÓÖÑ Ø ÔÓÙÖr=f = Ð ÓÖ Ø Ñ ½º Ó ÖÙÒ ÖÒÓÒÓÒ ÓÖÑ e = (u,v) Ø Ö ÙÐ ÑÑ Å ÒØÝ Ì ÒØÕÙ³ Ð Ü Ø Ö ÒÓÒÓÒ ÓÖÑ ÓÖ ÒØ ÓÑÑ e Ö Ôº Ò Ò ÒÚ Ö e µ ÒÖ Ñ ÒØ ÖÐ Ð ÓÑÑ Ø Ù ÓÙ ¹ Ò Ñ Ð ÓÒØ Ò ÒØv µ µ ÒÖ Ñ ÒØ Ö Ö Ôº Ö Ñ ÒØ ÖµÐ ÓØ Ö ÙÝÐ ÕÙ ÓÒØ

ÔÖ ÙÜ Ø Ô È Ø Ø Ü ÑÔÐ d c b d c b a Ê ÙÐØ Ø ÙÖÙÒ Ü ÑÔÐ +1 a +1 Flot unitaire Arc non conforme (noir)

Outline Introduction au cours 1 Introduction au cours Compilation et optimisations de codes Des p'tites boucles, toujours des p'tites boucles Exemples de spécicités architecturales 2 3 Intérêts et problèmes simple : variantes Fusion avec décalage

et durées de vie. Ex : Maxlive Maxlive = maximum number of simultaneous live values. Dependence graph + xed allocation. kernel a b c a b c 1 time

et durées de vie. Ex : Maxlive Maxlive = maximum number of simultaneous live values. Dependence graph + xed allocation. Maxlive = 3 (after b and before c) : kernel a b c a b c 1 time a b c a b c a b c

et durées de vie. Ex : Maxlive Maxlive = maximum number of simultaneous live values. Dependence graph + xed allocation. Maxlive = 3 (after b and before c) : kernel a b c a b c 1 time a b c a b c a b c Maxlive = 2, same allocation, b delayed by 1 : a c a b c a b c

et durées de vie. Ex : Maxlive Maxlive = maximum number of simultaneous live values. Dependence graph + xed allocation. Maxlive = 3 (after b and before c) : kernel a b c a b c 1 time a b c a b c a b c Maxlive = 2, same allocation, b delayed by 1 : a c a b c a b c Retiming changes both Maxlive and FIFO sizes (if used).

FIFO sizes : general constraints More accurate use of memory For each e = (u, v) E, dene t out (u) and t in (e) such that : if t σ(u, i) + t out (u), the result of (u, i) is available. if t σ(v, i) + t in (e), the value was already read by (v, i). By denition of delays d, d(e) > t out (u) t in (e).

FIFO sizes : general constraints More accurate use of memory For each e = (u, v) E, dene t out (u) and t in (e) such that : if t σ(u, i) + t out (u), the result of (u, i) is available. if t σ(v, i) + t in (e), the value was already read by (v, i). By denition of delays d, d(e) > t out (u) t in (e). Reminder for a modulo schedule σ(u, i) = λ.i + ρ u, the dependence constraint is expressed as : λ.w(e) + ρ v ρ u d(e)

FIFO sizes : general constraints More accurate use of memory For each e = (u, v) E, dene t out (u) and t in (e) such that : if t σ(u, i) + t out (u), the result of (u, i) is available. if t σ(v, i) + t in (e), the value was already read by (v, i). By denition of delays d, d(e) > t out (u) t in (e). Reminder for a modulo schedule σ(u, i) = λ.i + ρ u, the dependence constraint is expressed as : λ.w(e) + ρ v ρ u d(e) FIFO at least k + 1 locations if : σ(u, i + k) + t out (u) σ(v, i + w(e)) + t in (e), i.e., λ.k + ρ u + t in (e) λ.w(e) + ρ v + t out (u) FIFO of size at least 1 + w(e) + ρ v ρ u+t out (u) t in (e) λ.

Linear system Introduction au cours Constraints With retiming r(u) : ρ u = λ.r(u) + s u, with s u < λ (s u is xed). r(v) r(u) d(e)+s u s v λ w(e) = d 1 (e). 1 + w(e) + ρ v ρ u+t out (u) t in (e) λ = r(v) r(u) + d 2 (e).

Linear system Introduction au cours Constraints With retiming r(u) : ρ u = λ.r(u) + s u, with s u < λ (s u is xed). r(v) r(u) d(e)+s u s v λ w(e) = d 1 (e). 1 + w(e) + ρ v ρ u+t out (u) t in (e) λ = r(v) r(u) + d 2 (e). Linear program min{ u V M(u) r(v) r(u) d 1 (e), M(u) r(v) r(u) + d 2 (e)}

Linear system Introduction au cours Constraints With retiming r(u) : ρ u = λ.r(u) + s u, with s u < λ (s u is xed). r(v) r(u) d(e)+s u s v λ w(e) = d 1 (e). 1 + w(e) + ρ v ρ u+t out (u) t in (e) λ = r(v) r(u) + d 2 (e). Linear program min{ u V M(u) r(v) r(u) d 1 (e), M(u) r(v) r(u) + d 2 (e)} Variant Add new vertex u for each u and r(u ) = M(u) + r(u) : min{ u V r(u ) r(u) r(v) r(u) d 1 (e), r(u ) r(v) d 2 (e)} Connection matrix = totally unimodular = polynomial problem

With shared storage, e.g., register bank : Maxlive Need to consider each time t modulo λ, actually all σ(v, i) + t in (e). Produced at time t : prod(e, t) = {i N σ(u, i) + t out (u) t}. Consumed at t : cons(e, t) = {i N σ(v, i + w(e)) + t in (e) t}. Live along e at t : live(e, t) = prod(e, t) cons(e, t). Live at t live(u, t) = prod(e, t) min e=(u,v) cons(e, t). Note : a) cons(e, t) is minimal when σ(v, i + w(e)) + t in (e) is maximal (last reader) ; b) v forced to be last reader with similar constraints : σ(v, i + w(e)) + t in (e) σ(v, i + w(e )) + t in (e ).

Inequalities with retiming Constraints σ(u, i) + t out (u) t i i t t out(u) s u λ r(u). σ(v, i + w(e)) + t in (e) t i i t t in (e) s v λ w(e) r(v). live(e, t) = w r (e) + s(e, t) with w r (e) = w(e) + r(v) r(u) and s(e, t) = t t out(u) s u λ t t in (e) s v λ.

Inequalities with retiming Constraints σ(u, i) + t out (u) t i i t t out(u) s u λ r(u). σ(v, i + w(e)) + t in (e) t i i t t in (e) s v λ w(e) r(v). live(e, t) = w r (e) + s(e, t) with w r (e) = w(e) + r(v) r(u) and s(e, t) = t t out(u) s u λ t t in (e) s v λ. Linear program Opt 1 = ninimize max t [..λ[ u V max e=(u,v) (w r (e) + s(e, t)) = min{m r(v) r(u) d 1 (e), e E M(u, t) r(v) r(u) + w(e) + s(e, t), u V, t M u V M(u, t), t} Strongly NP-complete.

Inequalities with retiming Constraints σ(u, i) + t out (u) t i i t t out(u) s u λ r(u). σ(v, i + w(e)) + t in (e) t i i t t in (e) s v λ w(e) r(v). live(e, t) = w r (e) + s(e, t) with w r (e) = w(e) + r(v) r(u) and s(e, t) = t t out(u) s u λ t t in (e) s v λ. Linear program Opt 1 = ninimize max t [..λ[ u V max e=(u,v) (w r (e) + s(e, t)) = min{m r(v) r(u) d 1 (e), e E M(u, t) r(v) r(u) + w(e) + s(e, t), u V, t M u V M(u, t), t} Strongly NP-complete. If only one last reader (along e u ) for each u : polynomial because max t u (w r(e u ) + s(e u, t)) = u w r(e u ) + max t u s(e u, t).

Approximation up to V live(e, t) = w r (e) + s(e, t), s(e, t) = t t out(u) s u λ t t in (e) s v λ. t out (u) + s u = λ.α u + β u with β u < λ. t in (e) + s v = λ.δ e + γ e with γ e < λ. s(e, t) = α u + δ e + s (e, t) with s (e, t) = t β u λ t γ e λ.

Approximation up to V live(e, t) = w r (e) + s(e, t), s(e, t) = t t out(u) s u λ t t in (e) s v λ. t out (u) + s u = λ.α u + β u with β u < λ. t in (e) + s v = λ.δ e + γ e with γ e < λ. s(e, t) = α u + δ e + s (e, t) with s (e, t) = t β u λ t γ e λ. if β u = γ e, s (e, t) =. Let s min (e) =. if β u < γ e, s (e, t) = or 1. Let s min (e) =. if β u > γ e, s (e, t) = 1 or. Let s min (e) = 1.

Approximation up to V live(e, t) = w r (e) + s(e, t), s(e, t) = t t out(u) s u λ t t in (e) s v λ. t out (u) + s u = λ.α u + β u with β u < λ. t in (e) + s v = λ.δ e + γ e with γ e < λ. s(e, t) = α u + δ e + s (e, t) with s (e, t) = t β u λ t γ e λ. if β u = γ e, s (e, t) =. Let s min (e) =. if β u < γ e, s (e, t) = or 1. Let s min (e) =. if β u > γ e, s (e, t) = 1 or. Let s min (e) = 1. This leads to the polynomially-solvable linear program : Opt 2 = min{ u V M(u) r(v) r(u) d 1(e), e E, M(u) r(v) r(u) + w(e) + α u + δ e + s min (e), u V, e = (u, v)} Opt 2 Opt 1 Opt 2 + V

Outline Introduction au cours Intérêts et problèmes simple : variantes Fusion avec décalage 1 Introduction au cours Compilation et optimisations de codes Des p'tites boucles, toujours des p'tites boucles Exemples de spécicités architecturales 2 3 Intérêts et problèmes simple : variantes Fusion avec décalage

Loop fusion : interest and problems Intérêts et problèmes simple : variantes Fusion avec décalage Why? How? When? Increase size of basic blocks. Useful for both spatial and temporal locality. Useful for array contraction. Be careful of over-fusion. Polynomial algorithms? NP-completeness? Heuristics? Always dicult to nd a cost model... But complementary tool for more general transformations.

A few references for the basics Intérêts et problèmes simple : variantes Fusion avec décalage Kennedy and McKinley. Typed Fusion with Applications to Parallel and Sequential Code Generation. McKinley and Kennedy. Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution. Roth and Kennedy. Loop Fusion in High Performance Fortran.. On the Complexity of Loop Fusion.

Array contraction Introduction au cours Intérêts et problèmes simple : variantes Fusion avec décalage Goal : reduce the size of a local array, possibly into a scalar. do i=2,n a(i) = d(i) + 1 b(i) = a(i)/2 c(i) = b(i) + b(i-1) enddo do i=2,n a = d(i) + 1 b(i) = a/2 c(i) = b(i) + b(i-1) enddo

Array assignments and forall loops Intérêts et problèmes simple : variantes Fusion avec décalage A = B + C B = D + 1 A[1:n] = A[:n-1] + A[2:n+1] equivalent to : forall(i=,n+1) A(i) = B(i) + C(i) endforall forall(i=,n+1) B(i) = D(i) + 1 endforall forall(i=1,n) A(i) = A(i-1) + A(i+1) endforall and, then, to : doall(i=,n+1) A(i) = B(i) + C(i) enddoall doall(i=,n+1) B(i) = D(i) + 1 enddoall doall(i=,n+1) A'(i) = A(i) enddoall doall(i=1,n) A(i) = A'(i-1) + A'(i+1) enddoall

Partial loop distribution Intérêts et problèmes simple : variantes Fusion avec décalage DO i=1,n A(i) = 2*A(i) + 1 B(i) = C(i-1) + A(i) C(i) = C(i-1) + G(i) D(i) = D(i-1) + A(i) + C(i-1) E(i) = E(i-1) + B(i) F(i) = D(i) + B(i-1) ENDDO 1 1 1 C B E 1 1 A D F 1

Intérêts et problèmes simple : variantes Fusion avec décalage DOPAR i=1,n A(i) = 2*A(i) + 1 ENDDOPAR DOSEQ i=1,n C(i) = C(i-1) + G(i) ENDDOSEQ DOPAR i=1,n B(i) = C(i-1) + A(i) ENDDOPAR DOSEQ i=1,n E(i) = E(i-1) + B(i) ENDDOSEQ DOSEQ i=1,n D(i) = D(i-1) + A(i) + C(i-1) ENDDOSEQ DOPAR i=1,n F(i) = D(i) + B(i-1) ENDDOPAR How to get the following? DOSEQ i=1,n C(i) = C(i-1) + G(i) ENDDOSEQ DOPAR i=1,n A(i) = 2*A(i) + 1 B(i) = C(i-1) + A(i) ENDDOPAR DOSEQ i=1,n D(i) = D(i-1) + A(i) + C(i-1) E(i) = E(i-1) + B(i) ENDDOSEQ DOPAR i=1,n F(i) = D(i) + B(i-1) ENDDOPAR

Outline Introduction au cours Intérêts et problèmes simple : variantes Fusion avec décalage 1 Introduction au cours Compilation et optimisations de codes Des p'tites boucles, toujours des p'tites boucles Exemples de spécicités architecturales 2 3 Intérêts et problèmes simple : variantes Fusion avec décalage

Intérêts et problèmes simple : variantes Fusion avec décalage Simple loop fusion/distribution DO i=2, n a(i) = f(i) b(i) = g(i) c(i) = a(i-1) + b(i) d(i) = a(i) + b(i-1) e(i) = d(i-1) + d(i) ENDDO DOPAR i=2, n a(i) = f(i) b(i) = g(i) ENDDOPAR DOPAR i=2, n c(i) = a(i-1) + b(i) d(i) = a(i) + b(i-1) ENDDOPAR DOPAR i=2, n e(i) = d(i-1) + d(i) ENDDO A C B D E 1 prevents fusion simple precedence C 2 = max(1 + 1, 1 + ) Easy : compute longest dependence paths. A 3 = 2 + 1 B D E 1 2 = max(1 + 1, 1 + )

Maximal unordered typed fusion Intérêts et problèmes simple : variantes Fusion avec décalage Constraint : two loops of dierent types (colors) cannot be fused. Arbitrary number of colors : NP-complete (McKinley & Kennedy, 1994), even for chains of length 2.

Maximal unordered typed fusion Intérêts et problèmes simple : variantes Fusion avec décalage Constraint : two loops of dierent types (colors) cannot be fused. Arbitrary number of colors : NP-complete (McKinley & Kennedy, 1994), even for chains of length 2. Ordered typed fusion : polynomial O(T (V + E)) (see after).

Maximal unordered typed fusion Intérêts et problèmes simple : variantes Fusion avec décalage Constraint : two loops of dierent types (colors) cannot be fused. Arbitrary number of colors : NP-complete (McKinley & Kennedy, 1994), even for chains of length 2. Ordered typed fusion : polynomial O(T (V + E)) (see after). Fixed number of chains : polynomial O(d N d ) for d chains of length at most N. Dynamic programming.

Maximal unordered typed fusion Intérêts et problèmes simple : variantes Fusion avec décalage Constraint : two loops of dierent types (colors) cannot be fused. Arbitrary number of colors : NP-complete (McKinley & Kennedy, 1994), even for chains of length 2. Ordered typed fusion : polynomial O(T (V + E)) (see after). Fixed number of chains : polynomial O(d N d ) for d chains of length at most N. Dynamic programming. Two colors, no fusion-preventing edges : polynomial O(V + E).

Maximal unordered typed fusion Intérêts et problèmes simple : variantes Fusion avec décalage Constraint : two loops of dierent types (colors) cannot be fused. Arbitrary number of colors : NP-complete (McKinley & Kennedy, 1994), even for chains of length 2. Ordered typed fusion : polynomial O(T (V + E)) (see after). Fixed number of chains : polynomial O(d N d ) for d chains of length at most N. Dynamic programming. Two colors, no fusion-preventing edges : polynomial O(V + E). Three colors, no fusion-preventing edges : NP-complete.

Maximal unordered typed fusion Intérêts et problèmes simple : variantes Fusion avec décalage Constraint : two loops of dierent types (colors) cannot be fused. Arbitrary number of colors : NP-complete (McKinley & Kennedy, 1994), even for chains of length 2. Ordered typed fusion : polynomial O(T (V + E)) (see after). Fixed number of chains : polynomial O(d N d ) for d chains of length at most N. Dynamic programming. Two colors, no fusion-preventing edges : polynomial O(V + E). Three colors, no fusion-preventing edges : NP-complete. Two colors, fusion-preventing edges : NP-complete.

Maximal unordered typed fusion Intérêts et problèmes simple : variantes Fusion avec décalage Constraint : two loops of dierent types (colors) cannot be fused. Arbitrary number of colors : NP-complete (McKinley & Kennedy, 1994), even for chains of length 2. Ordered typed fusion : polynomial O(T (V + E)) (see after). Fixed number of chains : polynomial O(d N d ) for d chains of length at most N. Dynamic programming. Two colors, no fusion-preventing edges : polynomial O(V + E). Three colors, no fusion-preventing edges : NP-complete. Two colors, fusion-preventing edges : NP-complete. Two colors, fusion-prev. edges for one color : NP-complete. Ex : partial loop distribution for parallelism detection.

Ordered typed fusion Introduction au cours Intérêts et problèmes simple : variantes Fusion avec décalage a b c (a) original graph, (b) fuse black then white, (c) fuse white then black. Optimization of type T : W(v) : minimal number of loops of type T that have to be placed before v. W (v) = if v has no predecessor. For e = (u, v), w(e) = 1 if T (u) = T and either T (v) T or e is fusion-preventing. Otherwise w(e) =.

Intérêts et problèmes simple : variantes Fusion avec décalage 1 1 1 1 1 1 1 1 1 1 1 2 a b a d (a) weighted graph for fusing black, (b) fusion of black, (c) weighted graph for fusing white after black, (d) fusion of white. 1 1 1 1 1 1 1 1 1 1 2 2 a b c d (a) weighted graph for fusing white, (b) fusion of white, (c) weighted graph for fusing black after white, (d) fusion of black.