Program Analysis and Transformation: From the Polytope Model to Formal Languages Albert Cohen To cite this version: Albert Cohen. Program Analysis and Transformation: From the Polytope Model to Formal Languages. Networking and Internet Architecture [cs.ni]. Université de Versailles-Saint Quentin en Yvelines, 1999. English. <tel-00550829> HAL Id: tel-00550829 https://tel.archives-ouvertes.fr/tel-00550829 Submitted on 31 Dec 2010 HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
THESEdeDOCTORATdel'UNIVERSITEdeVERSAILLES Specialite:Informatique AlbertCOHEN presenteepar pourobtenirletitrededocteurdel'universitedeversailles Analyseettransformationdeprogrammes: Sujetdelathese: dumodelepolyedriqueauxlangagesformels FromthePolytopeModeltoFormalLanguages ProgramAnalysisandTransformation: Soutenuele21decembre1999devantlejurycomposede: Paul Jean-FrancoisCollard Luc Bouge Feautrier Berstel Examinateur Rapporteur William Patrice Bernard Jalby Vauquelin Quinton Directeur President Rapporteur Theseprepareeal'UniversitedeVersaillesSaint-Quentin-en-Yvelinesauseindu laboratoireprism(parallelisme,reseaux,systemesetmodelisation)
Remerciements en-yvelines,entreseptembre1996etdecembre1999,sousladirectiondejean- FrancoisCollardetPaulFeautrier. seaux,systemesetmodelisation)del'universitedeversaillessaint-quentin- CettetheseaeteprepareeauseindulaboratoirePRiSM(Parallelisme,Re- recherchesaucnrs)quiaencadrecettethese,etavecquij'aieulachance defairemespremierspasdanslarecherchescientique.sesconseils,sadisponibiliteextraordinaire,sondynamismeentoutescirconstances,etsesidees Jevoudraistoutd'abordm'adresseraJean-FrancoisCollard(chargede eclaireesontfaitbeaucoupplusqu'entretenirmamotivation.jeremercievivementpaulfeautrier(professeurauprism)poursaconanceetpourson succesponctuels. inter^etasuivremesresultats.atraverssonexperience,ilm'afaitdecouvrir aquelpointlarechercheestenthousiasmante,audeladesdicultesetdes mentenversjeanberstel(professeural'universitedemarne-la-vallee),pa- tricequinton(professeural'irisa,universitederennes)etbernardvau- quelin(professeuraulabri,universitedebordeaux),pourl'inter^etetla JesuistresreconnaissantenverstouslesmembresdemonJury;notam- ilsontrelucettethese,ycomprislorsquelaproblematiquen'appartenaitpas curiositequ'ilsontporteal'egarddemestravauxetpourlesoinaveclequel poursessuggestionsetcommentaireseclaires.merciennawilliamjalby aleursdomainesderecherches.ungrandmercialucbouge(professeurau souventconseilleavecbonnehumeur. (professeurauprism)pouravoiracceptedepresidercejuryetpourm'avoir LIP,EcoleNormaleSuperieuredeLyon)poursaparticipationaceJuryet couragementsetpourl'accesa(sa)machineparallele,aoliviercartonpour sonaideprecieusesurundomainetresexigeant,adenisbarthou,ivandjelic etvincentlefebvrepourleurcollaborationessentielleauxresultatsdecette J'exprimeegalementtoutemagratitudeaGuy-RenePerrinpoursesenpenseauxbonsmomentspassesaveclestouslesmembresdu(monastere) nonplusl'aideecacedesingenieursetdessecretairesdulaboratoire.jere- PhilippeClauss,ChristineEisenbeisetSanjayRajopadhye;etjen'oubliepas these.jemesouviensaussidepassionnantesdiscussionsavecpierreboulet, etaveclescompagnonsderouteduprismquisontdevenusmesamis. unepenseeparticulierepourmesparentsetpourmafemmeisabelle. Merciennamafamillepoursonsoutienconstantetinconditionnel,avec
DedicatedtoaBraveGNUWorld http://www.gnu.org CopyrightcAlbertCohen1999. Lacopieetladistributiondecopiesexactesdecedocumentsontautorisees,maisaucune thisnoticeispreserved. Verbatimcopyinganddistributionofthisdocumentispermittedinanymedium,provided modicationn'estpermise. Graphicsweredesignedusingxg,gnuplotandtheGasTEXpackage. ThisdocumentwastypesetusingLATEXandthefrenchpackage. Albert.Cohen@prism.uvsq.fr
TABLEOFCONTENTS 5 TableofContents ListofAlgorithms ListofFigures 7 Presentationenfrancais Grandeslignesdelathese,enfrancais. 119 1Introduction 1.1ProgramAnalysis...54 Dissertationsummary,inFrench. 1.2ProgramTransformationsforParallelization...57 1.3ThesisOverview...60 53 2Framework 2.1GoingInstancewise...61 2.2ProgramModel...63 2.2.1ControlStructures...63 2.3AbstractModel...65 2.2.2DataStructures...64 2.3.1NamingStatementInstances...66 2.4InstancewiseAnalysis...75 2.3.4LoopNestsandArrays...74 2.3.3AdressingMemoryLocations...71 2.3.2SequentialExecutionOrder...70 2.4.4MoreAboutApproximations...80 2.4.3AnExampleofInstancewiseReachingDenitionAnalysis...78 2.4.2ReachingDenitionAnalysis...77 2.4.1ConictingAccessesandDependences...76 2.5Parallelization...81 2.5.1MemoryExpansionandParallelismExtraction...81 3FormalTools 2.5.2ComputationofaParallelExecutionOrder...82 3.1PresburgerArithmetics...87 2.5.3GeneralEciencyRemarks...85 3.1.1Sets,RelationsandFunctions...88 3.1.2TransitiveClosure...89 3.2MonoidsandFormalLanguages...90 3.2.1MonoidsandMorphisms...90 3.2.2RationalLanguages...91 3.2.3AlgebraicLanguages...92 3.2.4One-CounterLanguages...94
6 TABLEOFCONTENTS 3.3RationalRelations...97 3.3.1RecognizableandRationalRelations...97 3.3.2RationalTransductionsandTransducers...98 3.3.3RationalFunctionsandSequentialTransducers...99 3.4Left-SynchronousRelations...101 3.4.1Denitions...102 3.4.2AlgebraicProperties...104 3.4.3FunctionalProperties...107 3.4.4AnUndecidabilityResult...109 3.4.5StudyingSynchronizabilityofTransducers...110 3.4.6DecidabilityResults...112 3.4.7FurtherExtensions...113 3.5BeyondRationalRelations...114 3.5.1AlgebraicRelations...114 3.5.2One-CounterRelations...116 3.6MoreaboutIntersection...119 3.6.1IntersectionwithLexicographicOrder...119 3.6.2ThecaseofAlgebraicRelations...120 3.7ApproximatingRelationsonWords...121 3.7.1ApproximationofRationalRelationsbyRecognizableRelations...121 3.7.2ApproximationofRationalRelationsbyLeft-SynchronousRelations...121 3.7.3ApproximationofAlgebraicandMulti-CounterRelations...122 4InstancewiseAnalysisforRecursivePrograms 123 4.1MotivatingExamples...123 4.1.1FirstExample:ProcedureQueens...123 4.1.2SecondExample:ProcedureBST...125 4.1.3ThirdExample:FunctionCount...125 4.1.4WhatNext?...126 4.2MappingInstancestoMemoryLocations...126 4.2.1InductionVariables...126 4.2.2BuildingRecurrenceEquationsonInductionVariables...128 4.2.3SolvingRecurrenceEquationsonInductionVariables...133 4.2.4ComputingStorageMappings...134 4.2.5ApplicationtoMotivatingExamples...137 4.3DependenceandReachingDenitionAnalysis...139 4.3.1BuildingtheConictTransducer...139 4.3.2BuildingtheDependenceTransducer...140 4.3.3FromDependencestoReachingDenitions...141 4.3.4PracticalApproximationofReachingDenitions...143 4.4TheCaseofTrees...145 4.5TheCaseofArrays...147 4.6TheCaseofCompositeDataStructures...148 4.7ComparisonwithOtherAnalyses...150 4.8Conclusion...154 5ParallelizationviaMemoryExpansion 155 5.1MotivationsandTradeos...155 5.1.1ConversiontoSingle-AssignmentForm...156 5.1.2Run-TimeOverhead...157 5.1.3Single-AssignmentforLoopNests...160 5.1.4OptimizationoftheRun-TimeOverhead...161
TABLEOFCONTENTS 7 5.1.5TradeobetweenParallelismandOverhead...168 5.2MaximalStaticExpansion...168 5.2.1Motivation...168 5.2.2ProblemStatement...173 5.2.3FormalSolution...174 5.2.4Algorithm...176 5.2.5DetailedReviewoftheAlgorithm...177 5.2.6ApplicationtoRealCodes...180 5.2.7BacktotheExamples...181 5.2.8Experiments...185 5.2.9Implementation...185 5.3StorageMappingOptimization...186 5.3.1Motivation...187 5.3.2ProblemStatementandFormalSolution...191 5.3.3OptimalityoftheExpansionCorrectnessCriterion...194 5.3.4Algorithm...195 5.3.5ArrayReshapingandRenaming...196 5.3.6DealingwithTiledParallelPrograms...199 5.3.7Schedule-IndependentStorageMappings...200 5.3.8DynamicRestorationoftheData-Flow...201 5.3.9BacktotheExamples...201 5.3.10Experiments...204 5.4ConstrainedStorageMappingOptimization...205 5.4.1Motivation...206 5.4.2ProblemStatement...209 5.4.3FormalSolution...210 5.4.4Algorithm...214 5.4.5BuildingExpansionConstraints...215 5.4.6Graph-ColoringAlgorithm...217 5.4.7DynamicRestorationoftheData-Flow...219 5.4.8ParallelizationafterConstrainedExpansion...222 5.4.9BacktotheMotivatingExample...223 5.5ParallelizationofRecursivePrograms...226 5.5.1ProblemsSpecictoRecursiveStructures...227 5.5.2Algorithm...228 5.5.3GeneratingCodeforReadReferences...230 5.5.4PrivatizationofRecursivePrograms...232 5.5.5ExpansionofRecursivePrograms:PracticalExamples...233 5.5.6StatementwiseParallelization...235 5.5.7InstancewiseParallelization...240 5.6Conclusion...242 6Conclusion 245 6.1Contributions...245 6.2Perspectives...247 Bibliography 249 Index 259
8 LISTOFFIGURES ListofFigures 1.2Run-timerestorationoftheowofdata...59 1.3Exposingparallelism...59 1.1Simpleexamplesofmemoryexpansion...58 2.3ControlautomataforprogramQueens...69 2.4Hash-tabledeclaration...72 2.2ProcedureQueensandcontroltree...67 2.1Aboutrun-timeinstancesandaccesses...62 2.5Aninodedeclaration...73 3.1StudyingtheLukasiewiczlanguage...95 2.7Execution-dependentstoragemappings...77 2.6ComputationofParikhvectors...74 3.5Left-synchronousrealizationofseveralorderrelations...103 3.4Synchronousand-synchronoustransducers...103 3.2One-counterautomatonfortheLukasiewiczlanguage...96 3.6Aleftandrightsynchronizableexample...104 3.3Sequentialandsub-sequentialtransducers...100 4.2ProcedureBSTandcompressedcontrolautomaton...125 4.3ProcedureCountandcompressedcontrolautomaton...126 4.1ProcedureQueensandcontroltree...124 4.7RationaltransducerforstoragemappingfofprogramBST...146 4.5Moreexamplesofinductionvariables...128 4.6ProcedureCountandcontrolautomaton...138 4.4Firstexampleofinductionvariables...127 4.8RationaltransducerforconictrelationofprogramBST...146 4.11One-countertransducerforconictrelationofprogramQueens...149 4.10RationaltransducerforstoragemappingfofprogramQueens...147 4.9RationaltransducerfordependencerelationofprogramBST...147 4.13One-countertransducerfortherestrictionofdependencerelationtoowdependences...151 4.12Pseudo-left-synchronoustransducerfortherestrictionoftoWR...150 4.14One-countertransducerforreachingdenitionrelationofprogramQueens...152 4.15Simpliedone-countertransducerfor...152 5.2Basicoptimizationsofthegeneratedcodeforfunctions...163 5.1Interactionofreachingdenitionanalysisandrun-timeoverhead...159 5.3Repeatedassignmentstothesamememorylocation...164 5.7Firstexample,continued...170 5.5Parallelismextractionversusrun-timeoverhead...167 5.6Firstexample...169 5.4ImprovingtheSAalgorithm...165
LISTOFFIGURES 5.8Expandedversionoftherstexample...170 9 5.9Secondexample...170 5.13Insertingcopy-outcode...181 5.12Thirdexample...172 5.11Maximalstaticexpansionforthesecondexample...172 5.10Partitionoftheiterationdomain(N=4)...171 5.17Convolutionexample...187 5.15Experimentalresultsfortherstexample...186 5.16Computationtimes,inmilliseconds...186 5.14Parallelizationoftherstexample...185 5.18Knapsackprogram...188 5.19KPinsingle-assignmentform...189 5.21PartialexpansionforKP...190 5.23Motivatingexamplesforeachconstraintinthedenitionoftheinterferencerelation195 5.20Instancewisereachingdenitions,schedule,andtilingforKP...190 5.22Casesoffexp 5.24Anexampleofblock-regularstoragemapping...200 5.25Timeandspaceoptimization...205 e(v)6=fexp e(w)in(5.17)...194 5.29Performanceresultsforstoragemappingoptimization...208 5.26Performanceresults...205 5.30Maximalstaticexpansion...208 5.28Parallelizationofthemotivatingexample...207 5.27Motivatingexample...206 5.32Whatwewanttoachieve...210 5.31Maximalstaticexpansioncombinedwithstoragemappingoptimization...209 5.36Single-assignmentformconversionofprogramQueens...234 5.34Howweachieveconstrainedstoragemappingoptimization...214 5.35Solvingtheconstrainedstoragemappingoptimizationproblem...215 5.33Strangeinterplayofconstraintandcoloringrelations...213 5.38PrivatizationofprogramQueens...236 5.40Secondmotivatingexample:programMap...237 5.39ParallelizationofprogramBST...237 5.37Implementationofthereadreferenceinstatementr...235 5.41ParallelizationofprogramQueensviaprivatization...239 5.44AutomaticinstancewiseparallelizationofprocedureP...243 5.42Parallelresolutionofthen-Queensproblem...240 5.43Instancewiseparallelizationexample...241
10 LISTOFALGORITHMS ListofAlgorithms Recurrence-Solve(system)...134 Recurrence-Rewrite(program;system)...131 Compute-Storage-Mappings(program)...135 Recurrence-Build(program)...130 Abstract-Implement-Phi(expanded)...158 Abstract-SA(program;W;)...157 Reaching-Denition-Analysis(program)...145 Dependence-Analysis(program)...141 Convert-Quast(quast;ref)...161 Loop-Nests-SA(program;)...161 Abstract-Implement-Phi-Not-SA(expanded)...167 Loop-Nests-ML-SA(program;ml)...166 Abstract-ML-SA(program;W;ml)...166 Loop-Nests-Implement-Phi(expanded)...162 MSE-Convert-Quast(quast;ref)...177 Compute-Representatives(equivalence)...178 Maximal-Static-Expansion(program;;)...177 Enumerate-Representatives(rel;fun)...179 Storage-Mapping-Optimization(program;;6;<par)...196 SMO-Convert-Quast(quast;ref)...197 CSMO-Convert-Quast(quast;ref)...216 Constrained-Storage-Mapping-Optimization(program;;;;<par)...216 Build-Expansion-Vector(S;./)...198 Cyclic-Coloring()...218 Partial-Renaming(program;./)...199 Near-Block-Cyclic-Coloring(;shape)...219 CSMO-Implement-Phi(expanded)...220 Recursive-Programs-Implement-Phi(expanded)...230 Recursive-Programs-Online-SA(program;)...232 CSMO-Eciently-Implement-Phi(expanded)...221 Statementwise-Parallelization(program;)...238 Recursive-Programs-SA(program;)...229 Instancewise-Parallelization(program;)...242
11 Presentationenfrancais suivants ecritsenanglais.sonorganisationestlereetdelastructuredelatheseetles sectionsetsous-sectionscorrespondentrespectivementauxchapitresetaleurssections. Lelecteurdesirantapprofondirundessujetspresentespourradoncsereporteralapartie Apresuneintroductiondetaillee,cechapitreoreunresumeenfrancaisdeschapitres correspondanteenanglaispourytrouverledetaildesalgorithmesainsiquedesexemples. Tabledesmatieres I I.1 Introduction...12 II I.3 I.2 Analysedeprogrammes...13 II.1 Modeles...20 Unevisionparinstances...20 Organisationdecettethese...19 Transformationsdeprogrammespourlaparallelisation...16 II.2 II.3 Modeledeprogrammes...21 IIIOutilsmathematiques...27 II.4 II.5 Analyseparinstances...25 Parallelisation...26 Modeleformel...22 III.4Depasserlesrelationsrationnelles...32 III.3Relationssynchronesagauche...31 III.1ArithmetiquedePresburger...27 III.5Complementssurlesapproximations...34 III.2Langagesformelsetrelationsrationnelles...28 IVAnalyseparinstancepourprogrammesrecursifs...34 IV.4Lesresultatsdel'analyse...39 IV.2Relierinstancesetcellulesmemoire...35 IV.3Analysededependancesetdedenitionsvisibles...38 IV.1Exemplesintroductifs...34 V IV.5Comparaisonavecd'autresanalyses...41 Expansionetparallelisation...42 V.1 V.2 Motivationsetcompromis...42 V.5 V.4 V.3 Parallelisationdeprogrammesrecursifs...46 Expansionoptimiseesouscontrainte...45 Optimisationdel'occupationenmemoire...45 Expansionstatiquemaximale...44 VIConclusion...49 VI.1Contributions...49 VI.2Perspectives...51
12 I Introduction PRESENTATIONENFRANCAIS facteurs:uneforteaugmentationdelafrequence,desbuspluslarges,l'utilisationdeplusieursunitesfonctionnelleseteventuellementdeplusieursprocesseurs,lerecoursades hierarchiesmemoirecomplexespourcompenserlestempsd'acces,etundeveloppement Lesprogresaccomplisenmatieredetechnologiedesprocesseursresultentdeplusieurs desperformancesoptimalespourunprogrammedonnedevientdeplusenpluscomplexe. l'executionsuperscalaireetdesarchitecturesparallelesamemoirepartagee,larecherche demoinsenmoinssimpleetuniforme:endepitdelagestionmaterielledescaches,de globaldescapacitesdestockage.uneconsequenceestquelemodeledemachinedevient Debonnesoptimisationspouruncasparticulierpeuventconduireadesresultatsdesastreuxavecunearchitecturedierente.Deplus,lagestionmateriellen'estpascapablede tirerpartieecacementdesarchitectureslespluscomplexes:enpresencedehierarchies memoireprofondes,dememoireslocales,decalculoutofcore,deparallelismed'instructionsoudeparallelismeagrosgrain,uneaideducompilateurestnecessairepourobtenir debonnesperformances. descriteresd'ecacitepratiquesetpourdevelopperdesoptimisationsspeciquespourune part,etpourlaplupartdesapplications,lesarchitecturessonttropdisparatespourdenir lacommunauteducalculahautesperformancesadecouvertdepuisdesannees.d'une L'industriedesarchitecturesetdescompilateurstoutentierearonteenrealiteceque machinedonnee.d'autre-part,lesprogrammessontecritsdetellesortequelestechniques ordinateursparalleles,unprogramme oubienl'algorithmequ'ilimplemente doit traditionnellesd'optimisationetdeparallelisationonttoutlemaldumondeanourrirla b^etedecalcull'ons'appr^eteainstallerdansunbanalordinateurportable. possederundegresusantdeparallelisme.danscesconditions,lesprogrammeursoules compilateursdoiventmettreenevidenceceparallelismeetappliquerlestransformations Pouratteindredesperformanceseleveesal'aidedesmicroprocesseursmodernesetdes necessairespouradapterleprogrammeauxcaracteristiquesdelamachine.uneautre exigenceestqueleprogrammesoitportablesurdesarchitecturesdierentes,ande oertesauxprogrammeurs. suivrel'evolutionrapidedesmachinesparalleles.lesdeuxpossibilitessuivantessontainsi {Premierement,leslangagesaparallelismeexplicite.Laplupartsontdesextensions nees,commehpf,oucombinerparallelismededonneesetdet^aches,commeles extensionsopenmppourarchitecturesamemoirepartagee.quelquesextensions parallelesdelangagessequentiels.ceslangagespeuvent^etreaparallelismededon- oucilkdumit[mf98].toutescesapprochesfacilitentlaprogrammationd'algorithmesparalleles.enrevanche,leprogrammeurestchargedecertainesoperations sontproposeessouslaformedebibliotheques:pvmetmpiparexemple,oubien desenvironnementsdehautniveaucommeimldel'universitedel'illinois[ssp99] techniquescommeladistributiondesdonneessurlesprocesseurs,lescommunica- {Deuxiemement,laparallelisationautomatiqued'unlangagesequentieldehautniveau.Lesavantagesevidentsdecetteapprochesontlaportabiliteetlasimplicite andecomprendre aumoinspartiellement quelscalculssonteectuesetou diedel'architectureetreduisentnotablementlaportabilite. tionsetlessynchronisations.cesoperationsrequierentuneconnaissanceapprofon- delaprogrammation.malheureusement,lat^achequiincombeaucompilateurparalleliseurdevientecrasante.eneet,leprogrammedoittoutd'abord^etreanalyse
I.INTRODUCTION resideleparallelisme.lecompilateurdoitalorsgenereruncodeparallele,enprenantencomptelesspecicitesdel'architecture.lelangagesourceusuelpourla 13 parallelisationautomatiqueestlefortran77.eneet,denombreusesapplications avanceesquel'approchehistoriquemaisplusprochesdecetravail:ellesconsiderent lelisationducoudelangagesfonctionnelscommelisp.cesrecherchessontmoins decontr^olerelativementsimples.plusieursetudesconsiderentneanmoinslaparal- scientiquesonteteecritesenfortran,n'autorisantquedesstructuresdedonneeset lesstructuresdedonneesetdecontr^olelesplusgenerales.denombreuxprojets derechercheexistent:parafrase-2etpolaris[bef+96]del'universitedel'illinois, versitedepassau[gl97],etpafdel'universitedeversailles;ilyaegalementun lecompilateurmccat/earth-cdel'universitemcgill[htz+97],loopodel'uni- nombrecroissantd'outilsdeparallelisationcommerciaux,commecft,forge, PIPSdel'EcoledesMinesdeParis[IJT90],SUIFdel'UniversitedeStanford[H+96], etsemi-automatique:cettetheseabordealafoisl'analyseetlatransformationdeprogrammes. Nousnousinteressonsprincipalementauxtechniquesdeparallelisationautomatique FORESYSouKAP. source,enameliorantuncertainnombredeparametresdel'execution.pourappliquerune I.1Optimiserouparalleliseurunprogrammerevientgeneralementatransformersoncode Analysedeprogrammes transformationdeprogrammealacompilation,ondoits'assurerquel'algorithmeimplementen'estpastoucheaucoursdel'operation.etantdonnequ'unalgorithmepeut^etre implementedebiendesmanieresdierentes,lavalidationd'unetransformationdeprogrammesrequiertunprocessusd'ingenierieal'envers(reverseengineering)pouretablir evidencestatique c.-a-d.alacompilation d'informationssurlesproprietesdynamiques c.-a-d.al'execution. Analysestatique proprietesdel'etatdelamachineentrel'executiondedeuxinstructions.cesetatssont appelespointsdeprogrammes.detellesproprietessontditesstatiquescarellesrecouvrent touteslesexecutionspossiblesconduisantaunpointdeprogrammedonne.bienentendu, Enmatiered'analysedeprogrammes,lespremieresetudessesontporteessurles l'informationlaplusprecisepossiblesurcequefaitleprogramme.cettetechniquefondamentaled'analysedeprogrammestentederesoudreleproblemediciledelamiseen vientpasdela:ilseraitprobablementplusappropriedeparlerd'analyse(syntaxique). grandnombred'analysesstatiques.parmilesnombreusespresentationsdeceformalisme cesproprietessontcalculeeslorsdelacompilation,maislesensdel'adjectif(statique)ne [KU77,Muc97,ASU86,JM82,KS92,SRH96],onpeutidentierlespointscommunssuivants.Pourdecrirelesexecutionspossibles,lamethodeusuelleconsisteaconstruirele L'analysedeotdedonneesestlepremiercadregeneralproposepourformaliserle graphedeotdecontr^oleduprogramme[asu86];eneet,cegrapherepresentetousles pointscommedessommets,etlesar^etesentrecessommetssontetiqueteespardesinstructionsduprogramme.l'ensembledetouteslesexecutionspossiblesestalorsl'ensembledtesenunpointdonnesontdeniesdelafaconsuivante:puisquechaqueinstructionpeut touslescheminsdepuisl'etatinitialjusqu'aupointdeprogrammeconsidere.lesproprie-
14 modierunepropriete,ondoitprendreencomptetouslescheminsconduisantaupoint PRESENTATIONENFRANCAIS deprogrammeetrassembler(meet)touteslesinformationssurceschemins.laformalisationdecesideesestsouventappeleerassemblementsurtouslescheminsoumeetoverall etdel'abstractionmathematiquepourcelle-ci. paths(mop).biens^ur,l'operationderassemblementdependdelaproprieterecherchee Onprocedealorsauneresolutioniterativedesequationsdepropagation,jusqu'acequ'un intermediaires enavantouenarriere lelongdesar^etesdugraphedeotdecontr^ole. proprietesapartirdelaspecicationmop.lecalculestrealiseenpropageantlesresultats Enrevanche,lenombrepotentiellementinnidecheminsinterdittouteevaluationde eectivementleresultatdeniparmop c.-a-d.mfpconcideavecmop lorsque quelquesproprietessimplesdel'abstractionmathematiquesontsatisfaites;etceresultat pointxesoitatteint.c'estlamethodeditedupointxemaximaloumaximalx-point aeteetendual'analyseinter-proceduraleparknoopetsteen[ks92]. (MFP).Danslecasintra-procedural,KametUllman[KU77]ontprouvequeMFPcalcule (meet) auxpointsderencontre etdesjointures(join) associeesauxinstructions.danscecadre,cousotetcousot[cc77]ontproposeunschemad'approximation treillisenglobelaplupartdesabstractionscarelleautoriselecalculdesrassemblements breuses,enfonctiondel'applicationetdelacomplexitedel'analyse.lastructurede Lesabstractionsmathematiquespourlesproprietesdeprogrammessonttresnom- abstractionsdesproprietesal'aidedetreillis,etd'unautrecote,ilgarantitquetout adeuxinter^etsprincipaux:toutd'abord,ilpermetdeconstruiresystematiquementdes lesproprietesabstraitesalacompilation.ceformalismeappeleinterpretationabstraite fondesurdesconnectionsdegaloissemi-dualesentrelesetatsconcretsdel'executionet pointxecalculedansletreillisabstraitcorrespondauneapproximationconservatrice desmethodesiterativesassocieessontpresenteesdans[cou81,ch78,deu92,cre96]. otdedonnees,l'interpretationabstraitefacilitelespreuvesdecorrectionetd'optimalite d'unpointxedansletreillisdesetatsconcrets.toutenetendantleconceptd'analysede desanalysesdeprogrammes.desapplicationspratiquesdel'interpretationabstraiteet bonnesraisonsexpliquentegalementcefait: automatique.certainesraisonsimportantesnesontpasdenaturescientique,maisde l'interpretationabstraite ontrarementetealabasedestechniquesdeparallelisation Malgred'indeniablessucces,lesanalysesdeotdedonnees fondeesounonsur {lestechniquesmop/mfpsontprincipalementorienteesverslesoptimisationsclassiquesavecdesabstractionsrelativementsimples(lestreillisontsouventunehauteurbornee);leurcorrectionetleurecacitedansunveritablecompilateursontles {dansl'industrie,lesmethodesdeparallelisationsesonttraditionnellementconcentreessurlesnidsdebouclesetsurlestableaux,avecdesdegresimportantsde parallelismededonneesetdesstructuresdecontr^olesimples(nonrecursives,du enjeuxdeterminants,alorsquelaprecisionetl'expressivitedel'abstractionmathematiquesontalabasedelaparallelisationautomatique; premierordre);prouverlacorrectiond'uneanalyseestfaciledanscesconditions, {l'interpretationabstraiteconvientauxlangagesfonctionnelsavecunesemantique lateurdeviennentdesenjeuxmajeurs; alorsquel'applicationadesprogrammesreelsetl'implementationdansuncompi- operationnellepropreetsimple;lesproblemessoulevessontalorsorthogonauxaux questionspratiqueslieesauxlangagesimperatifsetbasniveau,traditionnellement plusadaptesauxarchitecturesparalleles(onverraquecettesituationevolue).
I.INTRODUCTION Enconsequence,lesanalysesdeotdedonneesexistantessontgeneralementdesana- 15 tructiondonnee.detelsresultatssontutilesauxtechniquesclassiquesdevericationet d'optimisation[muc97,asu86,skr90,krs94],maispourlaparallelisationautomatique lysesstatiquesquicalculentdesproprietesd'unpointdeprogrammedonneoud'uneins- onabesoind'informationssupplementaires. {Quediredesdierentesinstancesd'unpointdeprogrammeoud'uneinstructiona l'execution?puisquelesinstructionssontgeneralementexecuteesplusieursfois,on {Quediredesdierentselementsd'unestructurededonnees?Puisquelestableaux s'interesseal'iterationdeboucleoual'appeldeprocedurequiconduital'execution detelleinstruction. s'interesseal'elementdetableauouaunuddel'arbrequiestaccedeparune etlesstructuresdedonneesalloueesdynamiquementnesontpasatomiques,on Analyseparinstances instancedonneed'uneinstruction. maineassezrestreint,compareavecl'immensitedesproprietesetdestechniquesetudiees danslecadredel'analysestatique.lemodeledeprogrammeconsidereestegalementplus restreint laplupartdutemps puisquelesapplicationstraditionnellesdesparalleliseurssontlescodesnumeriquesavecdesnidsdebouclesetdestableaux. deselements.alorsquelaseulestructuredecontr^oleetaitlabouclefor/do,lesmethodes [Fea88a] lesanalysessontcapablesd'identierdesproprietesauniveaudesinstanceset iterativesavecdesolidesfondationssemantiquesparaissaientinutilementcomplexes.pour Desledebut aveclestravauxdebanerjee[ban88],brandes[bra88]etfeautrier Lesanalysesdeprogrammespourlaparallelisationautomatiqueconstituentundo- [Ban88]etdesanalysesdedependancesquirassemblentdesinformationssurlesinstances cialisesfutacoups^urpreferable.lespremieresanalysesetaientdestestsdedependance debouclesetdeseetsleselementsdetableaux,laconceptiondemodelessimplesetspe- seconcentrersurlaresolutiondesproblemescruciauxquesontl'abstractiondesiterations dansuneexpression,l'instancedel'instructionquiaproduitlavaleur.ellessontsouvent methodesplusprecisesonteteconcuespourcalculer,pourchaqueelementdetableaulu appeleesanalysesdeotdedonneespourtableaux[fea91,mal93],maisnouspreferons d'instructionsaccedantalam^emecellulememoire,l'undesaccesetantuneecriture.des laqualitedestechniquesdetransformation,etdonclesperformancesdesprogrammes letermed'analysededenitionsvisiblesparinstancespourfavoriserlacomparaisonavec nitionsvisibles[asu86,muc97].uneinformationaussipreciseamelioresignicativement unetechniqueparticuliered'analysestatiquedeotdedonneesappeleeanalysedededeledeprogrammes:ceux-cidevaientinitialementnecomporterquedesbouclessans paralleles. instructionsconditionnelles,avecdesbornesetdesindicesdetableauxanes,etsans appelsdeprocedures.cemodelelimiteenglobedejabonnombredecodesnumeriques, Lesanalysesparinstancesontlongtempssouertdeseveresrestrictionssurleurmo- approcheesurlesdependancesestdisponiblealacompilation:celainduitdesapproximationstropgrossieressurlesdenitionsvisibles.uncalculdirectdecesdenitionsvisibles desdicultesvientdel'impossibilited'etablirdesresultatsexacts,seuleuneinformation etilaegalementlegrandinter^etdepermettrelecalculexactdesdependancesetdesde- nitionsvisibles[fea88a,fea91].lorsquel'onchercheasupprimerdesrestrictions,l'une
16 estdoncnecessaire.detellestechniquesonteterecemmentmisesaupointparbarthou, PRESENTATIONENFRANCAIS CollardetFeautrier[CBF95,BCF97,Bar98]etparPughetWonnacott[WP95,Won95], avecdesresultatsextr^emementprecisdanslecasintra-procedural.parlasuite,etdansle casdestableauxetnidsdebouclessansrestrictions,notreanalysededenitionsvisibles parinstancesseral'analyseouedeotdesdonneesoufuzzyarraydataowanalysis (FADA)deBarthou,CollardetFeautrier[Bar98]. comptelesappelsdeprocedure[tfj86,hbcm94,ci96],maiscenesontpaspleinement instructionassocieesadesappelsdierentsdelaprocedureenglobante.eneet,cette desanalysesparinstancescarellesnedistinguentpaslesexecutionsmultiplesd'une Ilexistedenombreusesextensionsdecesanalysesquisontcapablesdeprendreen thesepresentelapremiereanalysequisoitpleinementparinstancespourdesprogrammes comportantdesappelsdeprocedures eventuellementrecursifs. dansunlangageimperatifainsiqueleurcompilationecacesurlesprocesseursmodernes I.2Ilestbienconnuquelesdependanceslimitentlaparallelisationdesprogrammesecrits Transformationsdeprogrammespourlaparallelisation consisteareduirelareutilisationdelamemoireenaectantdescellulesmemoiresdistinctesadesecrituresindependantes,c'est-a-direaexpanserlesstructuresdedonnees. etlessuper-calculateurs.unemethodegeneralepourreduirelenombrededependances donneesdum^emetype;leredimensionnementdetableaux,enparticulierl'ajoutdenouvellesdimensions;laconversiondetableauxenarbres;lamodicationdudegred'un arbre;latransformationd'unevariableglobaleenunevariablelocale. pourimplementerlareferenceexpansee[fea91].lagure1presentetroisprogrammes pourlesquelsaucuneexecutionparallelen'estpossible,enraisondesdependancesdesortie (certainsdetailsducodesontomis).lesversionsexpanseessontpresenteesenpartie Lesreferencesenlecturesontexpanseesegalement,enutilisantlesdenitionsvisibles comportent:lerenommagedevariables;ledecoupageoul'unicationdestructuresde direpourtransformerlesaccesmemoiredanslesprogrammes.lesmethodesclassiques Ilyadenombreusestechniquespourcalculerdesexpansionsdelamemoire,c'est-a- duparallelisme. droitedelagure,pourillustrerl'inter^etdel'expansiondelamemoirepourl'extraction enprovenancedediverscheminsdecontr^oleentrants.cesfonctionssontsemblables neesd'origine:desfonctionspeuvent^etrenecessairespour(rassembler)lesdenitions untravailsupplementaireestnecessairelorsdel'executionpourpreserverleotdedon- Malheureusement,lorsqueleotdecontr^olenepeutpas^etrepreditalacompilation, etenduespourlapremierefoisauxmethodesd'expansionparinstances[gc95,col98]. maisnonidentiques acellesduformalismed'assignationuniquestatiqueoustaticsingle-assignment(ssa)decytronetal.[cfr+91],etcollardetgriebllesontetrenceenlectureassociee(cetteinterpretationesttresdierentedelasemantiqueusuelle L'argumentd'unefonctionestl'ensembledesdenitionsvisiblespossiblespourlarefe- fonctionssontdonneesenpartiedroitedelagure. desfonctionsduformalismessa).lagure2proposedeuxprogrammesavecdesexpressionsconditionnellesetdesindexdetableauinconnus.desversionsexpanseesavec possibles,enfonctiondulangageetdel'architecture. cequiconcernel'implementationdeprogrammesparalleles,deuxvisionsdierentessont unetechniquetresgeneralepourexposerplusdeparallelismedanslesprogrammes.en L'expansionn'estpasuneetapeobligatoiredelaparallelisation;ellerestecependant
I.INTRODUCTION 17... intx; x=;=x; x=;=x; intx1,x2; x1=;=x1; x2=;=x2; Apresexpansion,c.-a-d.apresrenommagedexenx1etx2,lesdeuxpremieresinstructions peuvent^etreexecuteesenparalleleaveclesdeuxautres. inta[10]; for(i=0;i<10;i++){ s1 A[0]=; for(j=1;j<10;j++){ s2 A[j]=A[j-1]+; } inta1[10],a2[10][10]; for(i=0;i<10;i++){ s1 A1[i]=; for(j=1;j<10;j++){ s2 A2[i][j]={if(j=1)A1[i]; elsea2[i][j-1];} +; } Apresexpansion,c.-a-d.apresrenommagedutableauAenA1etA2puisajoutd'une dimensionautableaua2,laboucleforestparallele.ladenitionvisibleparinstances delareferencea[j-1]dependdesvaleursdeietj,commelemontrel'implementation avecuneinstructionconditionnelle. inta[10]; voidproc(inti){ A[i]=; =A[i]; if()proc(i+1); if()proc(i-1); } structtree{ intvalue; Tree*left,*right; }*p; voidproc(tree*p,inti){ p->value=; =p->value; if()proc(p->left,i+1); if()proc(p->right,i-1); } Apresexpansion,lesdeuxappelsdeprocedurepeuvent^etreexecutesenparallele.L'allocationdynamiquedelastructureTreeestomise....Figure1.Quelquesexemplesd'expansion... Lapremiereexploiteleparallelismedecontr^ole,c'est-a-direleparallelismeentredes instructionsdierentesdum^emeblocdeprogramme.lebutconsistearemplacerleplus d'executionssequentiellesd'instructionspardesexecutionsparalleles.enfonctiondulangage,ilyaplusieurssyntaxesdierentespourcodercetypedeparallelisme,etcelles-ci peuventnepastoutesavoirlem^emepouvoird'expression.nouspreferonslasyntaxe spawn/syncdecilk[mf98](prochedecelledeopenmp)auxblocsparallelesdealgol68etducompilateurearth-c[htz+97].commedans[mf98],lessynchronisations portentsurtouteslesactivitesasynchronescommenceesdansleblocenglobant,etdes synchronisationsimplicitessontajouteesauxpointsderetourdesprocedures.encequi concernel'exempledelagure3,l'executiondea,betcenparallelesuiviesequentiellementdedpuisdeeaeteecritedansunesyntaxealacilk.enpratique,chaque instructiondecetexempleseraitprobablementunappeldeprocedure.
18 PRESENTATIONENFRANCAIS... intx; s1x=; s2if()x=; r=x; intx1,x2; s1x1=; s2if()x2=; r=(fs1;s2g); Apresexpansion,onnepeutpasdecideralacompilationquelleestlavaleurluepar l'instructionr.onnesaitseulementquecelle-cinepeutvenirquedes1oudes2,etle calculdecettevaleurestcachedansl'expression(fs1;s2g).celle-ciobservesis2aete executee,siouielleretournelavaleurdex2,sinoncelledex1. inta[10]; s1a[i]=; s2a[]=; r=a[i]; inta1[10],a2[10]; s1a1[i]=; s2a2[]=; r=(fs1;s2g); Apresexpansion,onnesaitpasalacompilationquelleestlavaleurlueparl'instruction r,puisquel'onneconna^tpasl'elementdutableauaecritparl'instructions2....figure2.restaurationduotdedonneesal'execution...... spawna; spawnb; spawnc; sync;//attentedelaterminaisondea,betc D; E;...Figure3.Syntaxeduparallelismedecontr^ole... Ladeuxemevisionestexploiteleparallelismededonnees,c'est-a-direleparallelisme entredesinstancesdierentesdelam^emeinstructionoudum^emebloc.lemodelea parallelismededonneesaetelonguementetudiedanslecasdesnidsdeboucles[pd96], enraisondesonadequationaveclestechniquesecacesdeparallelisationpourlesalgorithmesnumeriquesetpourlesoperationsrepetitivessurdegrosjeuxdedonnees. OnutiliseraunesyntaxesimilairealadeclarationdebouclesparallelesenOpenMP,ou touteslesvariablessontsupposeespartageespardefaut,etunesynchronisationimplicite estajouteealandechaquesortiedeboucle. Pourgenererducodeaparallelismededonnees,beaucoupd'algorithmesutilisentdes transformationsdebouclesintuitivescommelassion,lafusion,l'echange,lerenversement,latorsion,lareindexationdebouclesetlereordonnancementdesinstructions.mais leparallelismededonneesestegalementadapteal'expressiond'unordred'execution parallelesousformed'ordonnancement,c'est-a-direenaectantunedated'executiona chaqueinstanced'uneinstruction.leschemadeprogrammedelagure4montredonne uneideedelamethodegeneralepourimplementeruntelordonnancement[pd96].le conceptdefrontd'executionf(t)estfondamentalpusiqu'ilrassembletouteslesinstances {quis'executentaladatet. Lepremieralgorithmed'ordonnancementestd^uaKennedyetAllen[AK87],lequela
I.INTRODUCTION... 19 for(t=0;t<=l;t++){//lestlalatencedel'ordonnancement }//synchronisationimplicite parallelfor({2f(t)) executeinstance{... Figure4.Implementationclassiqued'unordonnancementdanslemodeleaparallelisme dedonnees inspiredenombresmethodes.ellessefondenttoutessurdesabstractionsrelativementapproximativesdesdependances,commelesniveaux,lesvecteursetlesc^onesdedependance. Lacomplexiteraisonableetlafacilited'implementationdansuncompilateurindustriel utile,maisl'absencedesupportpourdeciderduparametredel'ordonnancementquel'on plusrecemmentdedarteetvivien[dv97]donnentunevisionglobaledecesalgorithmes. UnesolutiongeneraleaeteproposeeparFeautrier[Fea92].L'algorithmeproposeesttres constituentlesavantagesprincipauxdecesmethodes;lestravauxdebanerjee[ban92]et doitoptimiserconstitueunpointfaible:est-celalatencel,lenombredecommunications (surunemachineamemoiredistribuee),lalargeurdesfronts? tantplusvraipourlesprogrammesrecursifsouladistinctionentrelesdeuxparadigmes reecritdansunmodeleaparallelismedecontr^ole,sanspertedeparallelisme.c'estd'aurallelismededonnees,encesensquetoutprogrammeaparallelismededonneespeut^etre Pournir,ilestbienconnuqueleparallelismedecontr^oleestplusgeneralquelepa- n'estpastresclaire[fea98].enrevanche,pourdesprogrammesetdesarchitecturesreels, parallele principalementenraisondusurco^utassociealagestiondesactivites.des avanceesrecentesdanslematerieletleslogicielsontpoutantmontrequelasituationest leparallelismededonneesalongtempsetenettementplusadapteaucalculmassivement exemple[mf98]. lationsdejeuxcommelesechecs,etalgorithmesdetri)onteteobtenusaveccilkpar entraind'evoluer:d'excellentsresultatspourdesprogrammesparallelesrecursifs(simu- etentdanslessectionssuivantes.lasectionii resumantlechapitre2 decritun I.3Quatrechapitresstructurentcettetheseavantlaconclusionnale,etceux-cisere- Organisationdecettethese formalismegeneralpourl'analyseetlatransformationdeprogrammes,etpresenteles denitionsutilesauxchapitressuivants.lebutestd'^etrecapabled'etudierunelarge classedeprogrammes,desnidsdebouclesavectableauxauxprogrammesetstructures dedonneesrecursifs. pitre3;certainssontbienconnus,commel'arithmetiquedepresburgeretlatheoriedes langagesformels;certainssontplut^otpeucourantsdanslesdomainesduparallelisme etdelacompilation,commelestransductionsrationnellesetalgebriques;etlesautres DesresultatsmathematiquessontrassemblesdanslasectionIII resumantlecha- lestechniquesd'approximationpourtransductionsrationnellesetalgebriques. sontprincipalementdescontributions,commelestransductionssynchronesagaucheet
20LasectionIV resumantlechapitre4 s'attaqueal'analysedeparinstancesde PRESENTATIONENFRANCAIS programmesrecursifs.celle-ciestfondeesuruneextensiondelanotiondevariabled'inductionauxprogrammesrecursifsetsurdenouveauxresultatsentheoriedeslangages formels.deuxalgorithmespourl'analysededependanceetdedenitionvisiblesontproposes.ceux-cisontexperimentessurdesexemples. conditionnelles,debornesdebouclesetd'indexdetableaux;laquatriemesous-section sententdestechniquespourexpanserlesnidsdebouclessansrestrictiond'expressions l'objetdelasectionv resumantlechapitre5.lestroispremieressous-sectionspre- Lestechniquesdeparallelisationfondeessurl'expansiondelamemoireconstituent estunecontributional'optimisationsimultaneedesparametresd'expansionetdeparallelisation;etlacinquiemesous-sectionpresentenosresultatssurl'expansionetla II parallelisationdeprogrammesrecursifs. these,nouspresentonsuncadregeneralpourdecriredesanalysesetdestransformations Andeconserverunformalismeetunvocabulaireconstanttoutaulongdecette Modeles deprogrammes.nousavonsmisl'accentsurlarepresentationdesproprietesdeprogrammesauniveaudesinstances,toutenmaintenantunecertainecontinuiteavecles [KU77,CC77,JM82,KS92]:l'objectifprincipalconsisteaetablirdesresultatsconvaincantssurlapertinenceetl'ecacitedenostechniques. autrestravauxdudomaine.nousnecherchonsaconcurrenceraucunformalismeexistant notionsd'analyseetdetransformationdecode. Nousdecrivonsensuitelesabstractionsmathematiquesassociees,avantdeformaliserles programme,nousdenissonsunmodeledeprogrammespourlerestedecetteetude. Apresunepresentationformelledesinstancesd'instructionsetdesexecutionsd'un II.1 defois,acausedesstructuresdecontr^oleenglobantes.pourdecrirelesproprietesdu Aucoursdel'execution,chaqueinstructionpeut^etreexecuteeuncertainnombre Unevisionparinstances uneinstancedesal'executionestuneexecutionparticulieredesaucoursdel'execution duprogramme.danslecasdesnidsdeboucles,onutilisesouventlescompteursdeboucles distinguerentrecesdierentesexecutionsd'unem^emeinstruction.pouruneinstructions, otdedonneesaussiprecisementquepossible,nostechniquesdoivent^etrecapablesde generaldenommageseraetudiedanslasectionii.3. pournommerlesinstances,maiscettetechniquen'estpastoujoursapplicable:unschema avecleurenvironnement,plusieursexecutionsdum^emecodesontdoncassocieesades ensemblesd'instancesdierentsetadesproprietesduotincompatibles.nousn'aurons pasbesoinicid'undegreelevedeformalisation:uneexecutioned'unprogrammepest Lesprogrammesdependentparfoisdel'etatinitialdelamemoireetinteragissent donneeparunetraced'executiondep,c'est-a-direunesequencenieouinnie(lorsquele lesexecutionspossiblesestnotee.pourunprogrammedonne,onnoteiel'ensemble programmeneterminepas)decongurations(etatsdelamachine).l'ensembledetoutes desinstancesassocieesal'executione2e.enplusderepresenterl'execution,l'indicee rappellequel'ensembleieest(exact):cen'estpasuneapproximation. rencesalamemoire,l'uned'entreellesetanteventuellementuneecriture(c.-a-d.en Bienentendu,chaqueinstructionpeutcomporterplusieurs(ycompriszero)refe-
II.MODELES partiegauche).uncouple({;r)constitued'uneinstanced'instructionetd'unereference 21 leslectures,c.-a-d.lesacceseectuantuneoperationdelectureenmemoire;etwe, l'ensembledetouteslesecritures,c.-a-d.lesacceseectuantuneoperationd'ecritureen l'ensembledetouslesaccesestnoteae.ilsepartitionneen:re,l'ensembledetoutes dansl'instructionestappeleunacces.pouruneexecutiondonneee2ed'unprogramme, gauche,onconfondsouventlesaccesenecritureassociesetlesinstancesdel'instruction. memoire.danslecasd'uneinstructioncomportantunereferencealamemoireenpartie II.2 desextensionssyntaxiquesdec++).lespointeurssontautorises,etlestableauxaplusieursdimensionssontaccedesaveclasyntaxe[i1,:::,in] cen'estpaslasyntaxedu Nosprogrammesserontecritsdansunstyleimperatif,avecunesyntaxealaC(avec Modeledeprogrammes comptelespointeursdefonction[cou81,deu90,har89,afl95].lesappelsrecursifs,les premierordre,maisdestechniquesd'approximationpermettentdeprendreegalementen C pourfaciliterlalecture.cetteetudes'interesseprincipalementauxstructuresdu supposeenrevanchequelesgotoonteteprealablementeliminespardesalgorithmesde restructurationdecode[asu86,bak77,amm92]. boucles,lesinstructionsconditionnelles,etlesmecanismesd'exceptionsontautorises;on entiers,ottants,pointeurs...),lesenregistrements(ourecords)descalairesnonrecursifs, lesarbresdetableauxetlestableauxd'arbres(m^emecha^nesrecursivement).poursim- lestableauxdescalairesoud'enregistrements,lesarbresdescalairesoud'enregistrements, Nousneconsidereronsquelesstructuresdedonneessuivantes:lesscalaires(booleens, plier,noussupposonsquelestableauxsonttoujoursaccedesavecleursyntaxespeci- n'estpasevidentdesavoirsitellestructureestunelisteouunarbreetnonungraphequel- d'arbressontaccedeesal'aidedepointeursexplicites(atraverslesoperateurs*et->). que(l'operateur[])etquel'arithmetiquedepointeursestdoncinterdite.lesstructures conque.desinformationssupplementairesdonneesparleprogrammeurpeuventresoudre leprobleme[ks93,fm97,mic95,hhn92],dem^emequedesanalysesalacompilation La(forme)desstructuresdedonneesn'estpasexplicitedanslesprogrammesC:il ticulierdel'analysed'alias[deu94,cbc93,gh95,lrz93,egh94,ste96].parlasuite, noussupposeronsquedetellestechniquesonteteappliqueesparlecompilateur. delaformedesstructuresdedonnees[gh96,srw96].l'associationdespointeursaune instancedonneed'unestructured'arbren'estpasevidentenonplus:ils'agitd'uncaspar- chaquedepassementdebornes(c'estlecasdanslasectionv);enrevanche,lesstructuresa ment,maisilarrivequel'onaitrecoursadestableauxdynamiquesdontlatailleevoluea construites,modieesetdetruites?laformedestableauxestsouventconnuestatique- Unequestionimportanteaproposdesstructuresdedonnees:commentsont-elles innie.lacorrectiond'unetelleabstractionestgarantielorsquel'oninterdittouteinsertionettoutesuppressional'execution.cetteregletresstrictesouretoutdem^emedeux aetudieleproblemedans[fea98]etnousauronslam^emevision:touteslesstructures dedonneessontsupposeesconstruitesjusqu'aleurextensionmaximale eventuellement basedepointeurssontalloueesdynamiquementavecdesinstructionsexplicites.feautrier exceptionsquenousetudieronsapresavoirintroduitl'abstractionmathematiquepourles structuresdedonnees.iln'enrestepasmoinsquedenombreuxprogrammesnerespectent malheureusementpascetteregle.
22 II.3 Modeleformel PRESENTATIONENFRANCAIS puisnousproposonsuneabstractionmathematiquedescellulesmemoire. Nommerlesinstancesd'instructions Nouspresentonsd'abordunemethodedenommagepourlesinstancesd'instructions, etiquettes,lapremiererepresentel'entreedanslaboucle,ladeuxiemecorrespondala quettesestnotectrl.lesbouclesmeritentuneattentionparticuliere:ellesonttrois vericationdelacondition,etlatroisiemerepresentel'iteration1.delam^ememaniere, Desormais,onsupposequechaqueinstructionporteuneetiquette,l'alphabetdeseti- lesinstructionsconditionnellesontdeuxlabels:unpourlaconditionetpourlabranche then,unautrepourlabrancheelse.nousetudieronsl'exempledelagure5;cette procedurecalculetouteslessolutionsduproblemedesnreines.... PvoidQueens(intn,intk){ IintA[n]; B=B=b A=A=afor(inti=0;i<n;i++){ if(k<n){ rjs if(){ for(intj=0;j<k;j++) Q =A[j]; }} A[k]=; } Queens(n,k+1); sj F}intmain(){ FPIAAaAaAJs sj sj P Q IAA } Queens(n,0); FPIAAaAaAJQPIAABBr JrFB Ellessontgeneralementdeniescommeunchemindel'entreedugraphedeotdecontr^ole...Figure5.LaprocedureQueensetunarbredecontr^ole(partiel)... jusqu'auneinstructiondonnee.2chaqueexecutiond'uneinstructionestenregistree,y comprislesretoursdefonctions.dansnotrecas,lestracesd'executionontuncertain Lestracesd'executionsontsouventutiliseespournommerlesinstancesal'execution. nombred'inconvenients,leplusgraveetantqu'uneinstancedonneepeutavoirplusieurs tracesd'executiondierentesenfonctiondel'executionduprogramme.cepointinterdit utiliseuneautrerepresentationdel'executionduprogramme[cc98,coh99a,coh97, l'utilisationdestracespourdonnerununiquenomachaqueinstance.notresolution Fea98].Pouruneexecutiondonnee,chaqueinstanced'uneinstructionsesitueal'extremite 2.Sanssesoucierdesexpressionsconditionnellesetdesbornesdeboucles. 1.EnC,lavericationsefaitjusteapresl'entreedanslaboucleetavantchaqueiteration
II.MODELES d'uneuniqueliste(ordonnee)d'entreesdeblocs,d'iterationsdebouclesetd'appelsde 23 procedures.achaquelistecorresponduncertainmot:laconcatenationdesetiquettesdes donneeulterieurement. Denition1L'automatedecontr^oled'unprogrammeestunautomatenidontlesetats instructions.cesconceptssontillustressurl'arbredelagure5,dontladenitionest sontlesinstructionsetouunetransitiond'unetatqaunetatq0exprimequel'instructionq0appara^tdansleblocq.unetelletransitionestetiqueteeparq0.l'etatinitial estlapremiereinstructionexecutee,ettouslesetatssontnaux. construction,ilsdecriventunlangagerationnellctrlinclusdansctrl. SiIestl'uniondetouslesensemblesd'instancesIepourtouteexecutiondonneee2E, Lesmotsacceptesparl'automatedecontr^olesontappelesmotsdecontr^ole.Par nouspermetdeparlerdu(motdecontr^oled'uneinstance).engeneral,lesensembles ilyauneinjectionnaturelledeisurlelangagelctrldesmotsdecontr^ole.ceresultat estenbijectionavecl'ensembledesmotsdecontr^ole.nousparleronsdoncegalementde considereronssouventl'ensembledetouteslesinstancessusceptiblesd'^etreexecutees, EetIe pouruneexecutiondonneee nesontpasconnusalacompilation.nous independammentdesinstructionsconditionnellesetdesbornesdeboucles.cetensemble (l'instancew),quisignie(l'instancedontlemotdecontr^oleestw). contr^ole.lesautomatesduprogrammequeenssontdecritssurlagure6. cesetatssontelimines.cettetransformationn'apasdeconsequencessurlesmotsde sortante.enpratique,onconsideresouventunautomatedecontr^olecompresseoutous Onremarquequecertainsetatsn'ontqu'unetransitionentranteetunetransition... FF AA PP AA I I BB B r JJ s Q a r sqaa P PFP A IAA bb B r rbb J QP bb aa J ss Figure6.a.Automatedecontr^ole Figure6.b.Automatedecontr^olecompressepourQueens...Figure6.Automatesdecontr^ole... duprogramme:lesinstructionsd'unm^emeblocsontordonneesselonleurapparition,et quel'onnote<seq.deplus,onpeutdenirunordretextuelpartiel<txtsurlesinstructions L'ordred'executionsequentield'unprogrammedenitunordretotalsurlesinstances
24 lesinstructionsapparaissantdansdesblocsdierentssontincomparables.danslecas PRESENTATIONENFRANCAIS desboucles,l'etiquettedel'iterations'executeaprestouteslesinstructionsducorpsde conditionnelles).parconstructiondel'ordretextuel,uneinstance{0s'executeavantune note<lex.cetordreestpartielsurctrletsurlctrl(notammentacausedesinstructions boucle.pourlaprocedurequeensonab<txtj<txta,r<txtbets<txtq.cetordre instance{sietseulementsileursmotsdecontr^olew0etwrespectifsverientw0<lexw. textuelengendreunordrelexicographiquesurlesmotsdecontr^ole(ordredudictionnaire) brancheissuedelaracine.untelarbreestappelearbredecontr^ole.unarbred'appel nudcorrespondalorsaumotdecontr^oleegalalaconcatenationdesetiquettessurla dontlaracineestnommee"etchaquear^eteestetiqueteeparuneinstruction.chaque Enn,lelangagedesmotsdecontr^oles'interpretefacilementcommeunarbreinni, partielpourleprogrammequeensestdonneparlagure5. L'adressagedescellulesmemoire precedemment[cc98,coh99a,coh97,fea98,ccg96].celui-cis'inspireegalementd'approchesassezdiverses[ala94,mic95,deu92,lh88]. Nousgeneralisonsiciuncertainnombredeformalismesquenousavionsproposes binaireestlr.l'ensembledesnomsd'ar^etesestnotedata;ladispositiondesarbresen delaracine.l'adressedelaracineestdonc"etcelledunudroot->l->rdansunarbre d'entiers.l'adressagedesarbressefaitenconcatenantlesetiquettesdesar^etesenpartant Sanssurprise,leselementsdetableausontindexespardesentiersoudesvecteurs partagentlam^emeabstractionmathematique:lemonode(voirsectioniii.2).eneet, leslangagesrationnels(adressagedesarbres)sontdessous-ensemblesdemonodeslibres memoireestdoncdecriteparunlangagerationnelldatadata. aveclaconcatenationdesmots,etlesensemblesdevecteursd'entiers(indexationdes Pourtravailleralafoissurlesarbresetsurlestableaux,onnotequecesdeuxstructures monodeassocieauxelementsvalidesdelastructureseranoteldata. tableaux)sontdesmonodescommutatifslibresavecl'additiondesvecteurs.l'abstraction d'unestructurededonneesparunmonodeestnoteemdata,etlesous-ensembledece revelel'expressivitedesabstractionssousformedemonodes.toutefois,nousneparleronspasdavantagedecesstructureshybridesdansceresumeenfrancais.parlasuite, Lecasdesembo^tementsd'arbresetdetableauxestunpeupluscomplexe,maisil l'abstractionpourn'importequellestructurededonneesdenotremodeledeprogrammes seraunsous-ensembleldatadumonodemdataaveclaloi. precedente.notreformalismeestcapableenrealitedegererlesdeuxexceptionssuivantes: debutduprogrammeouencoursd'execution,lesinsertionsenqueuedelisteetaux puisqueleotdesdonneesnedependpasdufaitquel'insertiond'unnuds'eectueau Ilesttempsderevenirsurl'interdictiondesinsertionsetdessuppressionsdelasection feuillesdesarbressontpermises;lorsquedessuppressionssonteectueesenqueuede Nidsdebouclesettableaux risquedeconduireadesapproximationstropconservatrices. listeouauxfeuillesdesarbres,l'abstractionmathematiqueesttoujourscorrectemais oumultimedia.enormementderesultatsd'analyseetdetransformationonteteobtenus pourcesprogrammes.notreformalismedecritsansproblemecegenredecodes,maisil bouclessurtableaux,notammententraitementdusignaletdanslescodesscientiques Denombreusesapplicationsnumeriquessontimplementeessousformesdenidsde
II.MODELES sembleplusnatureletplussimpledereveniradesnotionsplusclassiquespournommer 25 lesmotsdecontr^ole,carlesz-modulesontunestructurebeaucoupplusrichequecelle desimplesmonodescommutatifs. lesinstancesetadresserlamemoire.eneet,lesvecteursd'entierssontplusadaptesque equivalentesenl'absenced'appelsdeprocedures.enn,lesinstancesd'instructionsnese sontuneinterpretationparticulieredesmotsdecontr^ole,etquelesdeuxnotionssont d'iterations leformalismeclassiquepournommerlesinstancesdanslesnidsdeboucles EnutilisantdescorrespondancesdeParikh[Par66],nousavonsmontrequelesvecteurs l'instancedel'instructionsdontlevecteurd'iterationestx;hs;x;refirepresentel'acces construitapartirdel'instancehs;xietdelareferenceref. reduisentpasuniquementadesvecteursd'iteration,etnousintroduisonslesnotations suivantes(quigeneralisentlesnotationsintuitivesdelasectionii.1):hs;xirepresente danslasectioniv.5. D'autrescomparaisonsentrevecteursd'iterationetmotsdecontr^olesontpresentees modeleutilisedesmotsdecontr^oleetnondestracesd'execution.nouspreferonsiciutiliser II.4Ladenitiondesexecutionsd'unprogrammen'estpastrespratiquepuisquenotre Analyseparinstances dependpasdel'execution,l'ordresequentieletantdeterministe.aucontraire,ledomaine feassociechaqueaccesalacellulememoirequ'illitouecrit.onremarqueque<seqne (<seq;fe),ou<seqestl'ordred'executionsequentielsurtouteslesinstancespossibleset unevisionequivalenteoul'executionsequentiellee2ed'unprogrammeestuncouple defeestexactementl'ensembleaedesaccesassociesal'executione.lafonctionfe estappeleelafonctiond'accespourl'executioneduprogramme[cc98,fea98,cfh95, Coh99b,CL99].Poursimplier,lorsquel'onparleradu(programme(<seq;fe)),on entendral'ensembledesexecutions(<seq;fe)duprogrammepoure2e. Conitsd'accesetdependances enecriture alam^emecellulememoire:fe(a)=fe(a0). entreaccesalamemoire.deuxaccesaeta0sontenconits'ilsaccedent enlectureou L'analysedesconitsressemblebeaucoupal'analysed'alias[Deu94,CBC93]ets'appliqueegalementauxanalysesdecaches[TD95].Larelationdeconit larelationentre uneapproximationconservatricedelarelationdeconitquisoitcompatibleavecn'immentpasconna^treexactementfeete,l'analysedesconitsd'accesconsisteadetermineportequelleexecutionduprogramme: conitsd'acces estnoteeepouruneexecutiondonneee.commeonnepeutgenerale- Lesanalysesettransformationsrequierentsouventdesinformationssurles(conits) s'executentdansunordrequelconque.cesconditionss'exprimententermededependances:unaccesadependd'unautreaccesa0sil'und'entreeuxestuneecriture,s'ils Pourparalleliser,onabesoindeconditionssusantespourautoriserquedeuxacces 8e2E;8v;w2Ae: fe(v)=fe(w)=)vw: sontenconit fe(a)=fe(a0) etsia0s'executeavanta a0<seqa.larelationde dependancepouruneexecutioneestnoteee:adependdea0estnotea0ea. 8e2E;8a;a02Ae: a0eadef ()(a2we_a02we)^a0<seqa^fe(a)=fe(a0):
26 Uneanalysededependancessecontenteanouveaud'unresultatapproche,telque PRESENTATIONENFRANCAIS Analysededenitionsvisibles 8e2E;8a;a02Ae: a0ea=)a0a: etantdonneunelectureenmemoire,onveutconna^trel'instancequiaproduitlavaleur. denitionvisible.ils'agitenfaitdeladerniereinstance selonl'ordred'execution en L'accesenlectureestappeleutilisationetl'instancequiaproduitlavaleurestappelee Danscertainscas,onrechercheuneinformationplusprecisequelesdependances: accesenlectureestnoteee: dependanceavecl'utilisation.lafonctionassociantsonuniquedenitionvisibleachaque programmeconsidere.onajoutedoncuneinstancevirtuelle?quis'executeavanttoutes Ilsepeutqu'uneinstanceenlecturen'aitenfaitaucunedenitionvisibledansle 8e2E;8u2Re: e(u)=max <seqv2we:veu : approximedemaniereconservatricelesfonctionse: lesinstancesduprogrammeetinitialisetouteslescellulesmemoire. Lorsquel'oneectueuneanalysededenitionsvisibles,oncalculeunerelationqui Onpeutaussivoircommeunefonctionquicalculedesensemblesdedenitionsvisiblespossibles.Lorsque?appara^tdansunensmbled'instances,unevaleurnoninitialiseces:OnadejarencontrelanotationIquirepresentel'ensembledetouteslesinstances possiblespourn'importequelleexecutiond'unprogrammedonne: Parlasuiteonaurabesoindeconsidererdesensemblesapprochesd'instancesetd'ac- 8e2E;8u2Re;v2We: v=e(u)=)vu: risqued'^etrelue.cetteinformationpeut^etreutiliseepourverierlesprogrammes. Dem^eme,onutiliseralesapproximationsconservatricesA,RetWdesensemblesAe, ReetWe. 8e2E: {2Ie=){2I; construireunprogramme(<par;fexp II.5AveclemodeleintroduitparlasectionII.4,paralleliserunprogramme(<seq;fe)signie Parallelisation deproprietesdoivent^etresatisfaitespar<paretfexp construireunenouvellefonctiond'accesfexp direunordrepartieletunsousordrede<seq.onappelleexpansiondelamemoirelefaitde e),ou<parestunordred'executionparallele,c'est-a- l'executionsequentielle. e apartirdefe.biens^ur,uncertainnombre quisontduesalareutilisationdesm^emescellulesmemoire.indirectement,l'expansionmet L'expansiondelamemoireapourbutdereduirelenombrededependancessuperues e andepreserverlasemantiquede doncenevidenceplusdeparallelisme.onconsidereeneetunerelationdedependance exp epouruneexecutioneduprogrammeexpanse: 8e2E;8a;a02Ae: a0exp eadef ()(a2we_a02we)^a0<seqa^fexp e(a)=fexp e(a0):
III.OUTILSMATHEMATIQUES Pourdenirunordreparallelecompatibleavecn'importequelleexecutionduprogramme,ondoitconsidereruneapproximationconservatriceexp.Cetteapproximation 27 Theoreme1(correctiond'unordreparallele)Laconditionsuivantegarantitque estengeneraleinduiteparlastrategied'expansion(voirsectionv.4parexemple). l'ordred'executionparalleleestcorrectpourleprogrammeexpanse(ilpreservela semantiqueduprogrammed'origine). unique.onsupposeradoncqueexp=pourparalleliserdetelsprogrammes. Onremarquequeexp 8({1;r1);({2;r2)2A:({1;r1)exp({2;r2)=){1<par{2: unordred'executionparallele,etpourgenererlecodecorrespondant.lestechniquesde Enn,onnereviendrapasicisurlestechniquesutiliseespourcalculereectivement e coincideavecelorsqueleprogrammeestmisenassignation lasection5.5.encequiconcernelesmethodesassocieesauxnidsdeboucles,denombreux parallelisationdeprogrammesrecursifssontrelativementrecentesetserontetudieesdans algorithmesd'ordonnancementetdepartitionnement oudepavage(tiling) ontete proposes;maisleurdescriptionnepara^tpasindispensablealabonnecomprehensiondes techniquesetudieesparlasuite. III mathematiquesquenousutilisons.lelecteurinteresseparlestechniquesd'analyseetde Cettesectionrassemblelesrappelsetlescontributionsportantsurlesabstractions Outilsmathematiques III.1 transformationpeutsecontenterdenoterlesdenitionsettheoremesprincipaux. vecteursd'entiers.l'arithmetiquedepresburgernousconvientparticulierementpuisque Nousavonsbesoindemanipulerdesensembles,desfonctionsetdesrelationssurdes ArithmetiquedePresburger laplupartdesquestionsinteressantessontdecidablesdanscettetheorie.onladenit completdeprogrammationlineaireennombresentiers[sch86].lesalgorithmesutilisessont contraintesanesentieres.lasatisfactiond'uneformuledepresburgerestaucurde laplupartdescalculssymboliquesavecdescontraintesanes:c'estunproblemenp- apartirdesformuleslogiquesconstruitessur8,9,:,_,^,l'egaliteetl'inegalitede pratiquesurdesproblemesdetaillemoyenne. super-exponentielsdanslepirecas[pug92,fea88b,fea91],maisd'unegrandeecacite mationlineaireennombreentiers utiliseuneautrerepresentationpourlesrelations tationsdeprototypes;lasyntaxedesensembles,relationsetfonctionsetanttresproche desnotationsmathematiquesusuelles.pip[fea88b] l'outilparametriquedeprogram- NousutilisonsprincipalementOmega[Pug92]dansnosexperimentationsetimplemen- Denition2(quast)Unquastrepresentantunerelationaneestuneexpressionconditionnelleaplusieursniveaux,danslaquellelespredicatssontdestestssurlesignede anes:lanotiond'arbredeselectionquasi-aneouquasi-aneselectiontree,plussimplementappelequast. desrestesdetellesdivisions. 3.Lesformesquasi-anesetendentlesformesanesavecdesdivisionsentierespardesconstanteset formesquasi-anes3etlesfeuillessontdesensemblesdevecteursdecritsdansl'arith-
28metiquedePresburgeretendueavec? quiprecedetoutautrevecteurpourl'ordre PRESENTATIONENFRANCAIS lexicographique. serontdonnesdanslasectionv. f?getdecriventlesvecteursquinesontpasdansledomainedelarelation.desexemples Uneoperationclassiquesurlesrelationsconsisteadeterminerlacl^oturetransitive.Les Lorsquedesensemblesvidesapparaissentdanslesfeuilles,ilsdierentdusingleton algorithmesclassiquesneconsiderentquedesgraphesnis.malheureusement,danslecas desrelationsanes,ilsetrouvequelacl^otured'unerelationanen'enestgeneralement implementeesdansomega[kprs96].l'ideegeneraleconsisteaserameneraunesousclasseparapproximation,puisdecalculerexactementlacl^oture. Nousutiliseronsdoncdestechniquesd'approximationdeveloppeesparKellyetal.et pasune. monodes,leslangagesrationnelsetalgebriques,lesautomatesnis,etlesautomatesapile. III.2 Certainsconceptsfontpartiedufondcommuneninformatiquetheorique,commeles Langagesformelsetrelationsrationnelles desobjetsmathematiquesplusoriginaux:nouspresenteronslesresultatsessentielssurla parlasuite,al'aided'unexempleclassique.dansundeuxiemetemps,nousetudierons introductionsenfrancais.nousnouscontenteronsdoncdexerlesnotationsutilisees Lesouvragesdereferencesont[HU79]et[RS97a],maisilexisteegalementdenombreuses Langagesformels:exempleetnotations classedesrelationsrationnellesentremonodesdetypeni. a-d.reconnuparunautomateauncompteur sous-classedeslangagesalgebriques. LelangagedeLukasiewicz-Lsurunalphabetfa;bgestengendreparl'axiomeetla grammairedontlesproductionssont!ajb: LelangagedeLukasiewiczestunexemplesimpledelangageauncompteur c.- CelangageestapparenteauxlangagesdeDyck[Ber79],sespremiersmotsetant sontutilises,zestlesymboledefonddepile,icodelesnombrespositifs,etdlescode L'encodaged'uncompteursurunepilesefaitdelafaconsuivante:troissymboles b;abb;aabbb;ababb;aaabbbb;aababbb;::: nombresnegatifs;zinrepresentedoncl'entiern,zdnrepresente n,etzcodelavaleur sieurs:ils'agitalorsd'unemachinedeminsky[min67].cependant,lesautomatesadeux 0ducompteur.Lagure7decritunautomateapileacceptantlelangage-Lainsiqueson interpretationentermesdecompteur. compteursontdejalem^emepouvoird'expressionquelesmachinesdeturing,etlaplupart desquestionsinteressantesdeviennentdoncindecidables.pourtant,enimposantquelques Unegeneralisationnaturelledeslangagesauncompteurconsisteaenmettreplu- recentsonteteobtenus.l'etudedecesobjetsparaitricheenapplications,notamment restrictionssurlafamilledeslangagesaplusieurscompteurs,desresultatsdedecidabilite danslecasdestravauxdecomonetjurski[cj98].
III.OUTILSMATHEMATIQUES... 29!Zb;I!" a;i!ii1 ";Z!Z a;z!zi 2!1 b;>0; 1";=0 1 2 Figure7.a.Automateapile Figure7.b.Automateauncompteurassocie a;+1 Relationsrationnelles...Figure7.Exemplesd'automates... plusamplesdetails.soitmunmonode.unsous-ensemblerdemestunensemble reconnaissables'ilexisteamonodenin,unmorphismedemdansnetunsousensemblepdentelsquer= 1(P). Nousnouscontentonsdequelquesrappels;consulter[AB88,Eil74,Ber79]pourde pourlaconcatenation,maispaspourl'operationetoile.c'estlecasenrevanchedela l'intersectionetlecomplementaire.lesensemblesreconnaissablessontegalementclos d'algebrebooleenne:eneet,laclassedesensemblesreconnaissablesestclosepourl'union, Cesensemblesgeneralisentleslangagesrationnelstoutenconservantlastructure classedesensemblesrationnels,dontladenitionetendcelledeslangagesrationnels: soitmunmonode,laclassedesensemblesrationnelsdemestlapluspetitefamillede sous-ensemblesdemcomportant?etlessingletonsfmgm,closepourl'union,la concatenationetl'operationetoile. section.simestdelaformem1m2,oum1etm2sontdeuxmonodes,unsous-ensemble reconnaissabledemestappelerelationreconnaissable,etunsous-ensemblerationnelde Mestappelerelationrationnelle.Leresultatsuivantdecritla(structure)desrelations Engeneral,lesensemblesrationnelsnesontpasclospourlecomplementaireetl'inter- reconnaissables. Theoreme2(Mezei)UnerelationreconnaissableRM1M2estuneunionnie d'ensemblesdelaformeklouketlsontdesensemblesrationnelsdem1etm2. sablesetrationnelles.apartird'unerelationrentredesmonodesm1etm2,ondenit sontdesrelationsentremonodesdetypeni. Lestransductionsdonnentunevision(plusfonctionnelle)desrelationsreconnais- Parlasuitenousneconsidereronsquedesensemblesreconnaissablesetrationnelsqui classessontclosespourl'inversion,etlaclassedestransductionsreconnaissablesestegalementclosepourlacomposition. casdemonodeslibres:c'estletheoremedeelgotetmezei[em65,ber79],fondamental Celledestransductionsrationnellesestegalementclosepourlacompositiondansle rationnelle)ssisongrapheestunerelationreconnaissable(resp.rationnelle).cesdeux despartiesdem2,tellequev2(u)ssiurv.unetransductionestreconnaissable(resp. unetransductiondem1dansm2commeunefonctiondem1dansl'ensemblep(m2) pourl'analysededependances(sectioniv). Theoreme3(ElgotandMezei)SiA,BetCsontdesalphabets,1:A!Bet
302:B!Csontdestransductionsrationnelles,alors21:A!Cestune PRESENTATIONENFRANCAIS transducteurrationnel;ceux-cietendentnaturellementlesautomatesnisenajoutantun transductionrationnelle. (rubandesortie): Larepresentation(mecanique)desrelationsettransductionsrationnellesestappelee Denition3(transducteurrationnel)Pourunmonode(d'entree)M1etunmonode(desortie)M24,ondenituntransducteurrationnelT=(M1;M2;Q;I;F;E) nauxfq,etunensemblenidetransitions(ouar^etes)eqm1m2q. LetheoremedeKleeneassurequelesrelationsrationnellesdeM1M2sontexactementlesrelationsreconnuesparuntransducteurrationnel.OnnotejTjlatransduction avecunensemblenid'etatsq,unensembled'etatsinitauxiq,anensembled'etats Theoreme4Lesproblemessuivantssontdecidablespourlesrelationsrationnelles:estcequedeuxmotssontenrelation(entempslineaire),lavacuite,lanitude. AB,(AB) Restni,Restreconnaissable. deuxlettres.iln'estpasdecidabledesavoirsir\r0=?,rr0,r=r0,r= SoientRetR0deuxrelationsrationnellessurdesalphabetsAetBavecaumoins reconnueparletransducteurt:onditquetrealiselatransductionjtj.lorsqueles monodesm1etm2sontlibres,l'elementneutreestlemotvidenote". partielles.unefonctionrationnelle unefonctionpartielle,c.-a-d.tellequecard( Quelquesresultatsinteressantsconcernentlestransductionsquisontdesfonctions :M1!M2estunetransductionrationnellequiest unefonctionrationnelleestinclusedansuneautreetsiellessontegales. deuxalphabetsaetb,ilestdecidablequ'unetransductionrationnelledeadansb estunefonctionpartielle(eno(card(q)4)[ber79,bh77]).onpeutegalementdecidersi (u))1pourtoutu2m1.etantdonnes alphabets.untransducteurestsequentiellorsqu'ilestetiquetesurabetqueson automated'entree(obtenuenomettantlessorties)estdeterministe.untransducteur aceuxquel'onpeut(calculeralavolee)enlisantleurentree.soientaetbdeux Parmilestransducteursrealisantdesfonctionsrationnelles,ons'interessenotamment Denition4(transducteursous-sequentiel)PourdeuxalphabetsAetB,untransducteursous-sequentiel(T;)surABestuncoupleouTestuntransducteur sequentielrealiseunefonctionrationnelle.cettenotionde(calculalavolee)estunpeu troprestrictive,onconsidereplut^otl'extensionsuivante: Lafonction sequentielavecfpourensembled'etatsnaux,etou:f!bestunefonction. cecas denies'ilexisteunchemindanstacceptant(ujv)aboutissantaunetatnalq;dans (u)=v(q). realiseepar(t;)estdeniecommesuit:siu2a,lavaleur (u)est unautrepourdecidersiunesous-sequentielleestsequentielle.ilsontegalementpropose unalgorithmepolynomialpourdecidersiunefonctionrationnelleestsous-sequentielle,et Partantd'unedemonstrationdeChorut[Cho77],BealetCarton[BC99b]ontpropose End'autrestermes,ajouteunmotalandelasortied'untransducteursequentiel. rationnelle,lorsqu'elleexiste. unalgorithmepolynomialpourtrouverunerealisationsous-sequentielled'unefonction 4.LesmonodesM1etM2sontsouventomisdeladenition.
III.OUTILSMATHEMATIQUES III.3 Relationssynchronesagauche 31 indispensabledanslecadredel'analysededependances.feautrier[fea98]aproposeun derelationsrationnelles:l'algorithmenetermineacoups^urquelorsquel'intersectionn'est (semi-algorithme)pourrepondrealaquestionindecidabledelavacuited'uneintersection Lesrelationsrationnellesnesontpasclosespourl'intersection,maiscetteoperationest lecomplementaire). rationnellesavecunestructured'algebrebooleenne(c.-a-d.avecl'union,l'intersectionet pasvide.puisquenousvoulonscalculercetteintersection,nousadoptonsuneapproche dierente:onseramene parapproximationsconservatrices auneclassederelations etudieeindependammentparfrougnyetsakarocitch[fs93],maisnotrerepresentationest construituneclasseplusgenerale:lesrelationssynchronesagauche.cetteclasseaete dierente,lespreuvessontnouvellesetdenouveauxresultatsonteteobtenus.cetravail Lesrelationsreconnaissablesconstituentbienunealgebrebooleene,maisnousavons estleresultatd'unecollaborationavecoliviercarton(universitedemarne-la-vallee). longueurpourlesmotsd'entreeetdesortie:untransducteurrationnelsurdesalphabets AetBestsynchrones'ilestetiquetesurAB.Nousetendonscettenotiondelafacon suivante. Onrappelleunedenitionclassique,equivalentealaproprietedepreservationdela Denition5(synchronismeagauche)Untransducteurrationnelsurdesalphabels AetBestsynchroneagauches'ilestetiquetesur(AB)[(Af"g)[(f"gB) ^etrerealiseeparuntransducteursynchroneagauche.untransducteurrationnelest transitionsetiqueteessuraf"g(resp.f"gb). etseulesdestransitionsetiqueteessuraf"g(resp.f"gb)peuventsuivredes synchronisableagauches'ilrealiseunerelationsynchroneagauche. Unerelationouunetransductionrationnelleestsynchroneagauchesiellepeut realisentl'ordreprexeetl'ordrelexicographique(<txtestunordreparticuliersura).... Lagure8montredestransducteurssynchronesagauchesurunalphabetAqui Pourlestransducteurssuivants,xetyremplacentrespectivement8x2Aet8y2A. xjx 1 "jy "jy "jy Figure8.a.Ordreprexe 2 "jy5 1xjy;x<txtyxjy 2xj" "jy "jy xjx xj" 3 4...Figure8.Exempledetransducteurssynchronesagauche... Figure8.b.Ordrelexicographique
32Ilestconnuquelestransducteurssynchronesconstituentunealgebrebooleenne5. PRESENTATIONENFRANCAIS Theoreme5Laclassedesrelationssynchronesagaucheconstitueunealgebrebooleenne:elleestclosepourl'union,l'intersectionetlecomplementaire.Deplus,les Restreconnaissable,alorsTRestsynchroneagauche.Enn,laclassedesrelations chroneagauche,alorsstestsynchroneagauche;sitestsynchroneagaucheet relationsreconnaissablessontsynchronesagauche;sisestsynchroneettestsyn- cen'estpaslecasdesrelationsreconnaissables[ber79]etnousavonsmontrequ'ilenest synchronesagaucheestclosepourlacomposition. dem^emedesrelationssynchronesagauche. Lesrelationssynchronessontdecidablesparmilesrelationsrationnelles[Eil74],mais SiTestuntransducteursynchroneagauche,lescyclesdeTnepeuventavoirquetrois nellepeut^etreprouveesynchroneagauche.aceteet,onrappellelanotiondetauxde transmissiond'uncheminetiquetepar(u;v):ils'agitdurapportjvj=juj2q+[f+1g. Ons'interessecependantacertainscasparticulierspourlesquelsunerelationration- 0peuventsuivreceuxdetaux0,etseulslescomposantsdetaux+1peuventsuivreceux tauxdetransmissionpossibles:0,1et+1.touslescyclesd'unem^emecomposantefortementconnexedoiventavoirlem^emetauxdetransmission,seulslescomposantsdetaux Theoreme6Siletauxdetransmissiondechaquecycled'untransducteurrationnelest detaux+1.ilexisteunereciproquepartielle: transducteurestsynchronisableagauche. Nouspouvonsdonc\resynchroniser"unecertaineclassedetransducteurssynchronisablesagauche,asavoirlestransducteurssatisfaisantleshypothesesdutheoreme6.En deresynchronisationpourcalculerdesapproximationssynchronesagauchederelations sefondantsurunalgorithmedebealetcarton[bc99a],onpeutecrireunalgorithme 0,1ou+1,etsiaucuncycledetaux1suituncycledetauxdierentde1,alorsle rationnelles.cettetechniqueserautiliseedanslasectioniii.5. Lemme1SoientRetR0desrelationssynchronesagauchesurdesalphabetsAetB.Il pendancesetdedenitionsvisibles. Nousterminonssurdesproprietesdedecidabilite,essentiellespourl'analysedede- synchronesagauche. Noustravaillonstoujourssurladecidabilitedesrelationsreconnaissablesparmiles estdecidablequer\r0=?,rr0,r=r0,r=ab,(ab) Restni. rationnelles.nousutiliseronsdonclanotionderelationalgebrique ouhors-contexte III.4 Nousavonsparfoisbesoind'unepuissanced'expressionsuperieureacelledesrelations Depasserlesrelationsrationnelles Denition6(transducteurapile)EtantdonnesdeuxalphabetsAetB,untransducteurapileT=(A;B; ;0;Q;I;F;E)estconstitued'unalphabetdepile 6, destransducteursapile: quietendnaturellementcelledelangagealgebrique.cesrelationssontdeniesapartir 5.Touteslesproprietesetudieesdanscettesectionontdespreuvesconstructives. 6.LesalphabetsAetBsontsouventomisdeladenition. unmotnonvide0dans +appelemotdepileinitial,unensemblenid'etatsq,un
III.OUTILSMATHEMATIQUES ensembleiqd'etatsinitiaux,unensemblefqd'etatsnaux,etunensemble 33 nidetransitions(ouar^etes)eqab Q. Denition7(relationalgebrique)Laclassedesrelationsrealiseespardestransducteursapileestappeleeclassedesrelationsalgebriques. quecelled'automateapilerealisantunlangage. Lanotiondetransducteurapilerealisantunerelationestdeniedelam^ememaniere tionsalgebriques. Theoreme7Lesrelationsalgebriquessontclosespourl'union,laconcatenationetl'operationetoile.Ellessontegalementclosespourlacompositionavecdestransductions Bienentendu,lestransductionsalgebriquesconstituentlavisionfonctionnelledesrela- rationnelles.l'imaged'unlangagerationnelparunetransductionalgebriqueestun langagealgebrique. deuxmotssontenrelation(entempslineaire),lavacuite,lanitude. Ilyatrespeuderesultatssurlestransductionsalgebriquesquisontdesfonctionspartielles,appeleesfonctionsalgebriques.Enparticulier,nousneconnaissonspasdesous-classe Lesquestionssuivantessontdecidablespourlesrelationsalgebriques:est-ceque auncompteur,realiseesparuntransducteurauncompteur denitionsemblableacelle decesfonctionsquisoit(calculablealavolee)ausensdesfonctionssous-sequentielles. d'unautomateauncompteur.onpeutegalementconsidererplusd'uncompteur,mais l'onobtientalorslam^emepuissanced'expressionquelesmachinesdeturing.cetteclasse Neanmoins,unesous-classeinteressantedesrelationsalgebriquesestcelledesrelations Theoreme8SoientAetBdeuxalphabetsetnunentierpositif.Si1:A!Znet entremonodesnonlibres(letheoremedeelgotetmezeines'appliqueplus). nousinteresselorsquenoussommesamenesacomposerdestransductionsrationnelles 2:Zn!Bsontdestransductionsrationnelles,alors21:A!Bestune transductionancompteurs. Proposition1SoientAetBdeuxalphabetsetnunentierpositif.Soient1:A!Zn Deplus,onpeutdeduireunresultatimportantdelapreuvedutheoreme: Cetheoremeserautilisepourl'analysededependances,principalementavecn=1. et2:zn!bdestransductionsrationnellesettuntransducteurancompteurs sous-jacentat obtenuenomettantlesmanipulationsdepile estreconnaissable. realisant21:a!b(calculeavecletheoreme8.alors,letransducteurrationnel Proposition2SoitR1unerelationalgebriquerealiseeparuntransducteurapiledont rationnelle,d'apresleresultatsuivant: Ceresultatgarantitlacl^oturepourl'intersectionavecn'importequelletransduction letransducteurrationnelsous-jacentestsynchroneagauche,etsoitr2unerelation untransducteurapilequilarealisedontletransducteurrationnelsous-jacentest synchroneagauche.alorsr1\r2estunerelationalgebrique,etonpeutconstruire auxembo^tementsd'arbresetdetableaux,quenousn'abordonspasdansceresume. Enn,letheoreme8s'etendauxmonodespartiellementcommutatifslibresassocies
34 III.5 Complementssurlesapproximations PRESENTATIONENFRANCAIS montronsicicomments'yramenerenappliquantdesapproximationsconservatrices. closespourcetteoperation;maisnousavonsidentiedessous-classesquilesont.nous detransformationdeprogrammes.lesrelationsrationnellesetalgebriquesnesontpas Lecalculd'intersectionesttresutilisedanslecadredenostechniquesd'analyseet coupled'unetatinitialetd'unetatnal,etpourchaquecomposantefortementconnexe. lasortie.destechniquesplusprecisesconsistentaeectuercetteoperationpourchaque reconnaissables.l'ideegeneraleconsisteaconsidererleproduitcartesiendel'entreeetde Plusieursmethodespermettentd'approcherdesrelationsrationnellespardesrelations Leresultatesttoujoursunerelationreconnaissable,gr^aceautheoreme2. resynchronisation,etdoncsurletheoreme6.lorsquel'algorithmeechoue,onremplace unecomposantefortementconnexeparuneapproximationreconnaissableetonrecommence.desoptimisationspermettentden'appliquerqu'uneseulefoisl'algorithmede L'approximationpardesrelationssynchronesagaucheestfondeesurl'algorithmede mentaires,soitonapproximeletransducteurrationnelsous-jacentparuntransducteur dedeuxmanieres:soitonapproximelapile oulescompteurs pardesetatssupple- resynchronisation. synchroneagauche.lesdeuxtechniquesserontutiliseesparlasuite. L'approximationderelationsalgebriques ouaplusieurscompteur peutsefaire IV cursifs[ccg96,coh97,coh99a,fea98,cc98],nouspresentonsuneevolutionmajeure Apresuncertainnombredetravauxsurl'analyseparinstancesdeprogrammesre- Analyseparinstancepourprogrammesrecursifs avecunformalismeplusgeneraletuneautomatisationcompleteduprocessus.audela del'objectiftheoriqued'obtenirlemaximumdeprecisionpossible,nousverronsdansla sationautomatiquedeprogrammesrecursifs. sectionv.5commentcesinformationspermettentd'ameliorerlestechniquesdeparalleli- Cettesectionseterminesurunecomparaisonaveclesanalysesstatiquesetaveclestravaux nouspresentonslesanalysesdedependancesetdedenitionsvisiblesproprementdites. recentsportantsurl'analyseparinstancesdenidsdeboucles. Enpartantd'exemplesreels,nousdiscutonsducalculdevariablesd'inductionpuis instancespourstructuresrecursives.untroisiemeexempleestpresentedanslathese, IV.1 Nousetudionsdeuxexemplespourdonnerunapercuintuitifdenotreanalysepar Exemplesintroductifs Premierexemple:leprogrammeQueens maisilutiliseunestructurehybrideentrearbresettableauxdontnousneparlonspasici. programmeestreproduitsurlagure9avecunarbredecontr^olepartiel. vonsparexemplel'instancefpiaaaaaajqpiaabbrdel'instructionr,representeeparune Nousetudionslesdependancesentrelesinstancesal'executiondesinstructions.Obser- NousconsideronsanouveaulaprocedureQueenspresenteedanslasectionII.3.Le etoilesurlagure9.b.lavariablejestinitialiseea0parl'instructionbetincrementee parl'instructionb,noussavonsdoncquelavaleurdejenfpiaaaaaajqpiaabbrest0;
IV.ANALYSEPARINSTANCEPOURPROGRAMMESRECURSIFS... inta[n]; 35 PvoidQueens(intn,intk){ ra=a=afor(inti=0;i<n;i++){ IB=B=bif(k<n){ Js if(){ for(intj=0;j<k;j++) Q =A[j]; A[k]=; IAAaAaA FP }}} Queens(n,k+1); F}intmain(){ FPIAAJs FPIAAaAJs FPIAAaAaAJs sss JJJ QP Figure9.a.ProcedureQueens } Queens(n,0); FPIAAaAaAJQPIAABBr ecriventa[0] J IAA FBB Figure9.b.Arbredecontr^ole(compresse) r lita[0] doncfpiaaaaaajqpiaabbrlita[0].observonsapresentlesinstancesdes,representees...figure9.laprocedurequeensetunarbredecontr^ole... ecriventdoncdansa[0],etsontainsiendependanceavecfpiaaaaaajqpiaabbr. pardescarres.lavariablekestinitialiseea0lorsdupremierappelaqueens,puiselleest incrementeeparl'appelrecursifq.lesinstancesfpiaajs,fpiaaaajsetfpiaaaaaajs gureanouveau,onremarquequel'instancefpiaaaaaajs lecarrenoir s'exe- cuteendernier.deplus,onpeutassurerquecetteinstanceestexecuteelorsquela LaquelledecesdenitionsatteintelleFPIAAaAaAJQPIAABBr?Enobservantla- lecturefpiaaaaaajqpiaabbrs'execute.lesautresecrituressontdoncecraseespar ronsulterieurementcommentgeneralisercetteapprocheintuitive. Deuxiemeexemple:leprogrammeBST FPIAAaAaAJsquiestainsiladenitionvisibledeFPIAAaAaAJQPIAABBr.Nousmontre- lavaleurentieredunud.ilyapeudedependancessurceprogramme:lesseulessont searchtree.lesnudsdel'arbresontreferencespardespointeurs,etp->valuecontient valeursdesnudspourconvertirunarbrebinaireenarbrebinairederecherche,oubinary ConsideronsapresentlaprocedureBSTdelagure10.Cetteprocedureechangeles I2.Parconsequent,l'analysededenitionvisibledonneunresultattressimple:laseule desanti-dependancesentrecertainesinstancesd'instructionsal'interieurdesblocsi1ou IV.2 denitionvisibledetoutaccesenlectureest?. auxcellulesmemoirequ'ilslisentouecrivent.nousavonsdesormaisbesoind'expliciterces OnadenitdanslasectionII.4lanotiondefonctiond'acces.Celle-cirelielesacces Relierinstancesetcellulesmemoire fonctions,etnousintroduisonspourcelalanotiondevariabled'induction.enpresencede
36 PvoidBST(tree*p){... PRESENTATIONENFRANCAIS LI2 I1 a if(p->l!=null){ if(p->value<p->l->value){ BST(p->l); cb } p->l->value=t; p->value=p->l->value; t=p->value; RJ1 if(p->r!=null){ } LP J2 ed if(p->value>p->r->value){ BST(p->r); f } p->r->value=t; p->value=p->r->value; t=p->value; I1 I1 PFPJ1RP I2 J1 }intmain(){ } aab I2 b cc ddj2 J2 eeff F...Figure10.ProcedureBSTetautomatedecontr^ole(compresse)... } if(root!=null)bst(root); ^etreredenie.poursimplierl'exposition,noussupposonsquechaquevariablepossede proceduresrecursives,cettenotionhistoriquementlieeauxnidsdeboucles[wol92]doit denitiondesvariablesd'inductionestlasuivante: unnomdistinctifunique;onpourraainsiparlersansambigutede(lavariablei).notre {lesargumentsentiersd'unefonctionquisontinitialisesparuneconstanteoupar {lescompteursdeboucleentierstranslatesd'uneconstanteachaqueiteration; unevariableentiered'inductionplusuneconstante,achaqueappelrecursif; {lesargumentsdetypepointeurquisontinitialisesparuneconstanteouparune programmedelasectionii.2:lesstructuresdedonneesanalyseesdoivent^etredeclarees L'analyserequiertuncertainnombred'hypothesessupplementairessurlemodelede variabled'inductiondetypepointeureventuellementdereferencee. variabled'inductiondetypepointeurouuneconstante. entieresetdeconstantessymboliques;etlesaccesauxarbresdoiventdereferencerune globales;lesindicesdetableauxdoivent^etredesfonctionsanesdesvariablesd'induction valeurdelavariableial'instancewestdeniecommelavaleurdeiimmediatementapres executiondel'instancewdel'instruction.cettevaleurestnotee[i](w). andedecrirelesconitseventuels.soituneinstructionetwuneinstancede.la Prealablemental'analysededependances,nousdevonscalculerlesfonctionsd'acces Pourtant,gr^aceauxrestrictionsquenousavonsimposeesaumodeledeprogramme,les Engeneral,lavaleurd'unevariableenunmotdecontr^oledonnedependdel'execution.
variablesd'inductionsontcompletementdetermineesparlesmotsdecontr^ole.onmontre IV.ANALYSEPARINSTANCEPOURPROGRAMMESRECURSIFS 37 quepourdeuxexecutionsdierenteseete0,lesvaleursdedeuxvariablesd'induction sontidentiquessurenunmotdecontr^oledonne.lesfonctionsd'accespourdierentes recurrentes: executionsconcidentdonc,etnousconsidereronsdoncparlasuiteunefonctiond'acces findependantedel'execution. Lemme2Onconsiderelemonode(Mdata;)quiabstraitlastructurededonneesconsideree,uneinstruction,etunevariabled'inductioni.L'eetdel'instructionsur lavaleurdeiestdecritparl'unedesequationssuivantes: Leresultatsuivantmontrequelesvariablesd'inductionsontdecritespardesequations ouinducestl'ensembledesvariablesd'inductionduprogramme,ycomprisi. oubien92mdata;j2induc: oualors92mdata: 8u2Lctrl:[i](u)= 8u2Lctrl:[i](u)=[j](u) inductivesietk,seulesutilespourl'analysededependances. LeresultatsurlaprocedureQueensestlesuivant.Onnes'interessequ'auxvariables Del'appelprincipalF:[Arg(Queens;2)](F)=0 Del'iterationdeboucleb:8ub2Lctrl:[j](ub)=[j](u)+1 Del'entreedeboucleB:8uB2Lctrl:[j](uB)=0 Del'appelrecursifQ:8uQ2Lctrl:[Arg(Queens;2)](uQ)=[k](u)+1 DelaprocedureP:8uP2Lctrl:[k](uP)=[Arg(Queens;2)](u) Arg(proc;num)representelenumeargumenteectifd'uneprocedureproc,ettoutes l'evolutiondesvariablesd'inductiondansunprogramme.combineavecleresultatsuivant, cetalgorithmepermetdeconstruireautomatiquementlafonctiond'acces. lesautresinstructionslaissentlesvariablesinchangees. Onaconcuunalgorithmepourconstruireautomatiquementuntelsystemedecrivant Theoreme9Lafonctiond'accesf quiassociechaqueaccespossibledansaala cellulememoirequ'illitouecrit estunefonctionrationnelledectrldansmdata. (usjf(us;a[k])) (urjf(ur;a[j])) LeresultatpourleprogrammeQueensestlesuivant: Onaappliquelam^emetechniqueauprogrammeBST: =(FPIAAj0) (JQPIAAj0)+(aAj0)(BBj0)(bBj1)(rj0) =(FPIAAj0) (JQPIAAj1)+(aAj0)(Jsj0) (ujf(u;p->l->value)) (ujf(u;p->value)) 82fI2;a;bg: 82fI2;b;cg: =(FPj") (I1LPjl)+(J1RPjr)(I1I2j") (ujf(u;p->value)) 82fJ2;d;eg: =(FPj") (I1LPjl)+(J1RPjr)(I1I2jl) (ujf(u;p->r->value)) 82fJ2;e;fg: =(FPj") (I1LPjl)+(J1RPjr)(J1J2j") =(FPj") (I1LPjl)+(J1RPjr)(J1J2jr)
38 IV.3 Analysededependancesetdedenitionsvisibles PRESENTATIONENFRANCAIS engeneral,maisonpeutproterdufaitquelafonctiond'accesfnedependpasde entrelesaccesconictuelsalamemoire.nousnepouvonspasespererunresultatexact l'execution.larelationdeconitapprocheequenouscalculonsestlasuivante: Al'aidedesfonctionsd'acces,notrepremierobjectifconsisteacalculerlarelation def 1etdefestsoitunetransductionrationnellesoitunetransductionaplusieurs D'apresletheoremedeElgotetMezei(sectionIII.2)etletheoreme8,lacomposition 8u;v2Lctrl:uvdef ()v2f 1(f(v)): peutserameneraunseulcompteurparuneapproximationconservatrice. [Deu94,Ste96],etlavacuited'unerelationrationnelleoualgebriqueestdecidable. compteurs.lenombredecompteurscorrespondaladimensiondutableauaccede,eton relationauxcouplesd'accescomportantaumoinsuneecriture,puisonintersecteavec Pouretablirletransducteurdecrivantlesdependances,ondoitd'abordrestreindrela Onremarquequetesterlavacuitedeestequivalental'analysed'aliasentrepointeurs teurauncompteurdanslecasdestableaux,etparuntransducteurrationneldanslecas desarbres.deplus,gr^acealaproposition1,l'intersectionavecl'ordrelexicographique l'ordrelexicographique.enutilisantlestechniquesdessectionsiii.3,iii.4etiii.5,onpeut n'estpasapproximativedanslecasdestableaux. toujourscalculeruneapproximationconservatrice.celle-ciestrealiseeparuntransduc- etapederestrictiondeauxseulesdependancesdeot,ondoitutiliserdesproprietes lesdependances,onaurabeaucoupdemalaobtenirunresultatprecis.passeelapremiere additionnellessurleotdesdonnees.latechniqueprincipalequenousutilisonsestfondee Sil'onchercheacalculerlesdenitionsvisiblesapartirdel'informationapprocheesur Denition8(anc^etre)Ondenitunco:unsous-ensembledectrlconstituedetoutes suruneproprietestructurelledesprogrammes: l'executionestinconditionnelle. deboucles,etdetouslesappelsdeprocedure(nongardes),c.-a-d.lesblocsdont lesetiquettesdeblocsquinesontpasdesinstructionsconditionnellesoudescorps uvsestappeleanc^etredewr. contr^olewr2lctrl(uneinstanceder).siv2uncoesttelqueuvs2lctrl,alors Soientretsdeuxinstructionsdansctrl,etsoituunprexestrictd'unmotde maispaslescarresgrisadjacents.lesanc^etresontlesdeuxproprietessuivantes: gure9.bpage35:lecarrenoirfpiaaaaaajsestunanc^etredefpiaaaaaajqpiaabbr, Cettedenitionsecomprendaisementsurunarbredecontr^olecommeceluidela 2.l'executiondeuimpliquecelledeuvscarv2unco. 1.l'executiondewrimpliquecelledeuquiestsurlechemindelaracineaunudwr; ceresultatal'analysededenitionsvisibles,oncommenceparidentierlesinstances d'eliminationdetransitionssurletransducteurdesdependancesdeot.onobtientun dontl'executionestgarantieparlaproprietedesanc^etres,puisonappliquedesregles Ainsi,siuneinstances'execute,toussesanc^etreslefontegalement.Pourappliquer transducteurquirealiseuneapproximationdesdenitionsvisibles. lativementtechnique,nousenresteronsladansceresume. L'integrationdecesideesdansl'algorithmed'analysededenitionsvisiblesetantre-
IV.4 IV.ANALYSEPARINSTANCEPOURPROGRAMMESRECURSIFS Lesresultatsdel'analyse 39 programmebstestuntransducteurrationneldecritparlagure11.... Revenonstoutd'abordsurlecasdesstructuresd'arbres.Lafonctiond'accespourle LPjl I1 I1j" PFPj" J1j" RPjr I2pI2pj" J1 aaj"bpj" I2I2j" bp->ljlcjl I2p->ljlI2p->l c J2pJ2pj" ddj"epj" J2j" J2 J2p->rjr ep->rjrfjrj2p->r...figure11.transducteurrationnelpourlafonctiond'accesfduprogrammebst... f synchroneagaucheestnecessaire.leresultatpourbstestdecritparlagure12. Lorsqueleresultatestuntransducteursynchroneagauche,onpeutcalculerlesdependancessansapproximation,sinonuneapproximationdeal'aided'untransducteur Letransducteurduconitrealisantesttoujoursrationneldanslecasdesarbres.... LPjLP 2 I1jI1 1FPjFPJ1jJ1 RPjRP 8 3I2pjI2bp 6 4I2jI2I2p->ljI2c 7 5 9J2pjJ2epJ2jJ2 12 10 J2p->rjJ2f ajbp bp->ljc djep ep->rjf 13 11... Figure12.TransducteurrationnelpourlarelationdedependanceduprogrammeBST instructionsd'unm^emebloci1ouj1.nousverronsqueceresultatpermetdeparalleliser leprogramme. Onretrouvesurceresultatlefaitquelesdependancessesituententrelesinstancesdes
40Etudionsapresentlecasdestableaux.Lafonctiond'accespourleprogrammeQueens PRESENTATIONENFRANCAIS estdecriteparuntransducteurrationneldectrldansmdata=z,donnesurlagure13.... PFPj0 aaj0b A IAAj0 rj0bbj0jj0 QPj0 P0FPj0 bbj1 r J aaj0 A0IAAj0 Jj0 QPj1 J0 s0 sj0... Figure13.Transducteurrationnelpourlafonctiond'accesfduprogrammeQueens esttoujoursexact.leresultatpourqueensestdonneparlagure14. resynchronisationautransducteurrationnelsous-jacent(quiestreconnaissable),lecalcul relationdeconit.pourobtenirlarelationdedependance,onappliquel'algorithmede Onutiliseletheoreme8pourcalculeruntransducteurauncompteurrealisantla... "jbb; 1 24 "jbb "jaa "jr "jiaa13 "jj5 JjaA "jq aajaa 12 IAAj" 68 QPj";+1 "j" 7 13 JjJQPjQP;+1 IAAjIAA Jj" 9sj";=0 10 11 "jr;=015 14 FPjFP!0 aaj" 17 "jiaa sjqp "jbb 16 "jqp "jbb; 1 "jaa "jj 18...Figure14.Transducteurauncompteurpourlesdependancesdeot... demontrequeseulsdesanc^etresd'uneinstancederpeuvent^etredesdenitionsvisibles. Cetteproprietetresfortepermetd'eliminertouteslestransitionsquinemenentpasa mationssupplementairessurlesinstructionsconditionnellesduprogrammequeenson Onpeutdesormaiseectuerl'analysededenitionvisibles:enutilisantdesinfor-
unanc^etredansletransducteurdesdependances.leresultatestdonneparlagure15. IV.ANALYSEPARINSTANCEPOURPROGRAMMESRECURSIFS 41 calculeepourchaqueaccesenlecture. Onpeutmontrerfacilementqueleresultatestexact:uneuniquedenitionvisibleest... 1 JQPIAAjJQPIAA;+1!0 FPIAAjFPIAAaAjaA 2 JsjJQPIAA "jjqpiaa 3 "jbb; 1 "jaa "jbb 4 "jr;=0 5...Figure15.Transducteurauncompteurpour... l'aidedetransformationsprealables.desurcro^t,denombreusesrestrictionssemblent IV.5 Parmilesrestrictionsdumodeledeprogramme,certainespeuvent^etreelimineesa Comparaisonavecd'autresanalyses insertionsetsuppressionsdanslesarbresnesontautoriseesqu'auniveaudesfeuilles. dansnotreformalisme,etnousnevoyonspasdemethodegeneralepours'enpasser:les pouvoir^etreretireesdansdesversionsfuturesdel'analyse,al'aided'approximationadequates.ilsubsisteneanmoinsunerestrictiontresimportantequiestfermementenracinee JM82,Har89,Deu94]oud'autresformalismesd'analysedeotdedonnees[LRZ93,BE95, mentdesresultatssimilaires,qu'ellessoientfondeessurl'interpretationabstraite[cou81, HHN94,KSV96].Uneetudeinteressantedesanalysesstatiquesutilesenparallelisationest Lesanalysesstatiquesdedependanceetdedenitionvisiblesobtiennentgenerale- netravailauniveaudesinstances.aucunen'atteintlaprecisionnecessairepouridentier proposeedans[rr99].ilestaisedecomparernotretechniqueaveccesanalyses:aucune quelleinstancedequelleinstructionestenconit,endependance,ouestunedenition matiered'applicationsalaparallelisation,voirsectionv.5. visiblepossible.cesanalysessontcependantutilespourleveruncertainnombrederestrictionsdenotremodeledeprogrammes,etpourcalculerdesproprietesutilesal'analyse dedenitionsvisiblesparinstances.ilestplusinteressantdecomparercesanalysesen grammes,leresultatgeneraln'estpassurprenant:lesresultatsdelafadasontbien aveclafada[bcf97,bar98].surl'intersectioncommunedeleursmodelesdepro- plusprecis.eneet,nousn'utilisonslesinformationssurlesinstructionsconditionnelles Comparonsapresentaveclesanalysesparinstancepournidsdeboucles,parexemple seulcompteurpeut^etredecrit),etdesoperationsfondamentalescommel'intersection danslecasdetableauxaplusieursdimensions,lestransducteursrationnelsetalgebriques n'ontpasunpouvoird'expressionassezelevepourmanipulerdesparametresentiers(un qu'atraversdesanalysesexternes,desapproximationssupplementairessontnecessaires necessitentparfoisdesapproximations.onpeuttoutdem^emenoterdespointspositifs: l'exactitudeduresultatpeut^etredecideeentempspolyn^omialsurlestransducteursrationnels;lavacuiteesttoujoursdecidable,cequipermetunedetectionautomatiquedes variablesnoninitialisees;danslecasdesarbres,lestestsdedependances'eectuentsur deslangagesrationnelsdemotsdecontr^ole,cequiesttresutilepourlaparallelisation;
42 enn,danslecasdestableaux,lestestsdedependancesontequivalentsal'intersection PRESENTATIONENFRANCAIS d'unlangagerationnelavecunlangagealgebrique. bouclesanes.lestechniqueslespluscourantessontlamiseenassignationunique VLesrecherchessurl'expansiondelamemoireportentprincipalementsurlesnidsde Expansionetparallelisation optimisationspourlagestionecacedelamemoire[lf98,cfh95,cdrv97,qr99]. Lorsqueleotdecontr^olen'estpasprevisiblealacompilationoulorsquelesindexde [Fea91,GC95,Col98],laprivatisation[MAL93,TP93,Cre96,Li92]etdenombreuses tableauxnesontpasanes,leproblemedelarestaurationduotdesdonneesdevient capital,etlesconvergencesd'inter^etavecleformalismessa(staticsingle-assignment) [CFR+91]sonttresnettes.Enpartantd'exemplessimples,nousetudionslesproblemes speciquesauxnidsdebouclesnonanes,etproposonsdesalgorithmesdemiseenassignationunique.denouvellestechniquesd'expansionetd'optimisationdel'occupationen memoiresontensuiteproposeespourlaparallelisationautomatiquedecodesirreguliers. rentsdeceuxdesnidsdeboucles,etlesmethodesdeparallelisationexistantessefondent generalementsurdestestsdedependanceauniveaudesinstructions,alorsquenotreanalysedecritlarelationdedependanceauniveaudesinstances!nousmontronsquecette Lesprincipesducalculparalleleenpresencedeproceduresrecursivessonttresdie- informationtresprecisepermetd'ameliorernotablementlestechniquesclassiquesdeparallelisation.nousetudionsaussilapossibilited'expanserlamemoiredanslesprogrammes recursifs,etcetteetudeseterminepardesresultatsexperimentaux. V.1 desmethodesd'expansionlesplusclassiques.ellecorrespondaucasextr^emeouchaque Lamiseenassignationuniqueousingle-assignmentformconversion(SA)estl'une Motivationsetcompromis desrenommagesdevariables. lamiseenassignationuniquestatique(ssa)[cfr+91,ks98],oul'expansionselimitea cellulememoireestecriteauplusunefoisaucoursdel'execution.elledieredoncde assignationaunenouvellestructuredexpdontleselementssontdum^emetypequeceux ded,etsontenbijectionavecl'ensemblewdetouslesaccesenecriturepossiblesau coursdel'execution.dansunedeuxiemeetape,lesreferencesenlecturedoivent^etremises L'ideeconsistearemplacerchaqueassignationd'unestructurededonneesDparune ajourenconsequence:c'estcequel'onappellelarestaurationduotdesdonnees.on associeae(h{;refi).puisquel'onnedisposequed'uneapproximationdesdenitions visibles,cettetechniquen'estapplicablequelorsque(h{;refi)estunsingleton.sice lareferenceadenlectureh{;refidoit^etreremplaceeparunaccesal'elementdedexp utilisepourcelalesdenitionsvisiblesparinstances:pouruneexecutiondonneee2e, (h{;refi)desdenitionsvisiblespossibles. Cecodeestgeneralementrepresenteparunefonction,dontl'argumentestl'ensemble n'estpaslecas,ondoitgenereruncodederestaurationdynamiqueduotdesdonnees. dansleprogrammed'origineetl'identitedeladerniereinstancequiaecritunevaleur structurededonneessupplementaireenbijectionavecdexp:cettestructureestnoteedexp. OndoitmemoriserdeuxinformationsdansDexp:l'adressedelacellulememoireecrite Pourgenererlecodederestaurationdynamiqueassocieauxfonctions,onutiliseune danscettecellule.commeleprogrammeestenassignationunique,l'instanceestdeja
V.EXPANSIONETPARALLELISATION decriteparl'elementdedexpluim^eme:dexpdoitdonccontenirdesadressesdecellules 43 memoire.l'utilisationdecettestructureestlasuivante:oninitialisedexpanull;puis achaqueassignationdedexponecritdansdexpl'adressedelacellulememoireecrite l'adressedelacellulememoireluedansleprogrammed'origine. maximum selonl'ordresequentiel detousles{2settelsquedexp[{]soitegala dansleprogrammed'origine;ennunereference(set)estimplementeeparuncalculde desdonnees[col98]:desresultatsprecispermettentnonseulementdereduirelenombre defonctions,maisegalementdesimplierlesargumentsdecelles-ci,etdoncd'optimiser lescalculsdemaximumaucoursdel'execution.onremarqueraegalementquelecalcul L'analysededenitionsvisiblesparinstancesestalabasedelarestaurationduot deal'executionpeutluim^emeserevelerco^uteux,m^emeenl'absencedefonction. recursifs,nousverronsqueleproblemeducalculdeestplusdelicat. codegenere.l'exempledelagure16illustrecesremarques.danslecasdesprogrammes Danslecasdesnidsdeboucles,lesurco^utn'estpourtantd^uqu'al'implementationdu quastassociea;destechniquesdeparcoursdepolyedre[ai91]permettentd'optimiserle TA[0]=0;... doublea[n]; Sfor(i=0;i<N;i++) for(j=0;j<n;j++){ TAT=0; doublea[n],at,as[n,n],ar[n,n]; R } A[i+j]=; for(i=0;i<n;i++) Figure16.a.Programmed'origine A[i]=A[i+j-1]; SR for(j=0;j<n;j++){ AS[i,j]=; AR[i,j]=(fhTig[fhS;i0;j0i: Figure16.b.SAsansanalysededenitionsvisibles } (i0;j0)<lex(i;j)g) TAT=0; doublea[n],at; doubleas[n,n],ar[n,n]; doublea[n],at; Sfor(i=0;i<N;i++) doubleas[n,n],ar[n,n]; R for(j=0;j<n;j++){ AR[i,j]=if(j==0) AS[i,j]= ; elseas[i,j-1] if(i==0)at for(i=0;i<n;i++){ AT=0; AR[1,1]=AT; AS[1,1]=; } elseas[i-1,j] AS[i,1]=; AR[i,1]=AS[i-1,1]; Figure16.c.SAavecuneanalyseprecisedes } for(j=0;j<n;j++){ AR[i,j]=AS[i,j-1]; AS[i,j]=; denitionsvisibles chage)delaboucle Figure16.d.Analysepreciseet(eplu- }... Figure16.Interactionsentrel'analysededenitionsvisiblesetlesurco^utal'execution L'implementationreelledecestechniquesdependdesstructuresdecontr^oleetde
44 donnees.danslecasdesbouclesetdestableaux,nousproposonsdesalgorithmesde PRESENTATIONENFRANCAIS etudieronsdanslasectionv.5. miseenassignationuniquequietendentlesresultatsexistantsadesnidsquelconques.la miseenassignationuniquedeprogrammesrecursifsestundomainenouveauquenous l'aided'unenouvelleinformationsurleotdesdonneesappeleedenitionsvisibled'une reduitlesensemblesdedenitionsvisiblespossibles(lesargumentsdesfonctions)a.lapremiereappliquedesoptimisationssimplessurlesstructuresdexp;ladeuxieme Nousavonsegalementdeveloppetroistechniquespouroptimiserlecalculdesfonctions cellulememoire;etlatroisiemeeliminelesredondancesdanslecalculdumaximum eneectuantlescalculsaufuretamesure.cettedernieretechniquenegenerepasa proprementparlerunprogrammeenassignationunique,cequipeutparfoisnuireason ))quinenuitpasalaparallelisation. methoded'eliminationdesredondances(appeleeaussi(placementoptimisedesfonctions utilisationenparallelisationautomatique.avecunevisiondierentedel'expansion(pas necessairementenassignationunique),lasectionv.4proposeuneversionamelioreedela etdoncd'eliminerlemaximumdedependances sansrecouriradesfonctionspour V.2Lebutdel'expansionstatiquemaximaleestd'expanserlamemoirelepluspossible Expansionstatiquemaximale restaurerleotdesdonnees. possiblesd'unelectureu,etsupposonsqu'ellesaectentlam^emecellulememoire.siv etwecriventdansdeuxcellulesmemoiredierentesapresexpansion,unefonctionsera necessairepourchoisirlaquelledesdeuxecrituresdenitlavaleurlueparu.onintroduit Consideronsdeuxecrituresvetwappartenantal'ensembledesdenitionsvisibles m^emelecture: donclarelationrentrelesecrituresquisontdesdenitionsvisiblespossiblespourla memoiredansleprogrammed'origine,ellesdoiventfairedem^emedansleprogramme expanse.puisque(ecriredanslam^emecellulememoire)estunerelationd'equivalence, Lorsquedeuxdenitionsvisiblespossiblespourlam^emelectureecriventlam^emecellule 8v;w2W:vRw()9u2R:vu^wu: ecriture,onmontreleresultatsuivant: Proposition3Unefonctiond'accesfexp d'accesexpanseesfexp onconsidereenfaitlacl^oturetransitiverdelarelationr.enselimitantadesfonctions e delaforme(fe;),ouestunecertainefonctionsurlesaccesen pourtouteexecutionessi 8v;w2We;fe(v)=fe(w):vRw()(v)=(w): e =(fe;)estuneexpansionstatiquemaximale nousproposonsestlimiteauxnidsdebouclesquelconquessurtableaux.uncertainnombre depointstechniques notammentlacl^oturetransitivederelationsanes requierent valenced'unecertainerelation.leformalismeestdonctresgeneral,maisl'algorithmeque Apartirdeceresultat,onpeutcalculerunefonctionenenumerantlesclassesd'equi- uneattentionparticuliere,maisceux-cinesontpastraitesdansceresumeenfrancais. l'expansionstatique,ils'agitdoncd'uncompromisentresurco^utal'executionetparallelismeextrait.nouspresentonsegalementtroisexemples,surlesquelsnousappliquons semi-automatiquement(avecomega[pug92])l'algorithmed'expansion.toutefois,unseul Danslecasgeneral,lamiseenassignationuniqueexposeplusdeparallelismeque exempleestetudiedansceresume,voirsectionv.4.
V.EXPANSIONETPARALLELISATION V.3 Optimisationdel'occupationenmemoire 45 probablementapartirdelarelationapprocheedesdenitionsvisibles.ilestinteressantdenoterquecetordreparallelepeut^etreobtenuparn'importequelletechnique ordonnancementoupartitionnementparexemple tantqueleresultatpeut^etredecrit programmeexpansesanspertedeparallelisme.noussupposonsainsiqu'unordred'executionparallele<paradejaetedeterminepourleprogrammed'origine(<seq;fe) Nouspresentonsmaintenantunetechniquepourreduirel'occupationenmemoired'un parunerelationane. denitionsvisibles.onobtientalorsunprogrammeexpansequirequiert(generalement) moinsdememoirequelaformeenassignationunique,maisquiestcompatibleavec (data-ow),c'estadirel'ordre(leplusparallelepossible)d'apreslarelationde Moyennantuncalculdecl^oturetransitive,ilestm^emepossibledepartirdel'ordre n'importequelleexecutionparallelelegale. duprogrammed'origine.enutilisantlanotation expanseesfexp expansionscorrectesvisavisdecetordreparallele,c.-a-d.quellessontlesfonctionsd'acces Notrepremieret^achepourformaliserleproblemeconsisteadeterminerquellessontles 8v;w2W:v./wdef e quigarantissentquel'ordred'executionparallelepreservelasemantique 9u2R:vu^wparv^uparw^(u<seqw_w<seqv_v6w) _ 9u2R:wu^vparw^uparv^(u<seqv_v<seqw_w6v); () Theoreme10(correctiondesfonctionsd'acces)Silaconditionsuivanteestrem- nousavonsmontreleresultatsuivant: rallelepreservelasemantiqueduprogrammed'origine. plie,l'expansionestcorrecte,c'estadirequ'ellegarantitquel'ordred'executionpa- ecriredansdescellulesmemoiresdistincteslorsque:ws'executeentrevetudansle Intuitivement,unedenitionvisiblevd'unelectureuetuneautreecriturewdoivent 8e2E;8v;w2We:v./w=)fexp e(v)6=fexp e(w): etdelafonctiond'accesduprogrammed'origine. criteredecorrectionestoptimal,pouruneapproximationdonneedesdenitionsvisibles cellulememoirequevdansleprogrammed'origine.deplus,nousavonsmontrequece programmeparallele,etsoitwnes'executepasentrevetusoitwassigneuneautre nonbornedecritparunerelationane.lamethodeestlam^emequedanslecasdesnids debouclesanes,elleestdetailleeenfrancaisdanslathesedelefebvre[lef98]. Al'aidedececritere,lagenerationducodeexpanserequiertlacolorationd'ungraphe V.4 precedentes,etnousproposonsuncadregeneralpouroptimisersimultanementlesurco^ut Nousmontronsapresentqu'ilestpossibledecombinerlesdeuxtechniquesd'expansion Expansionoptimiseesouscontrainte doncdedonnerunexempleillustrantl'expansioncontrainte quigeneralisel'expansion statique combineeavecl'optimisationdel'occupationenmemoire. lesalgorithmessonttroptechniquespourfairepartiedeceresume,nousnouscontenterons del'expansionetleparallelismeextrait:l'expansioncontrainteoptimisee.leformalismeet
46... PRESENTATIONENFRANCAIS doublex; for(i=1;i<=m;i++){ for(j=1;j<=m;j++) doublext[m+1,m+1],xs[m+1,m+1,n+1]; TS if(p(i;j)){ x=0; R =x; } for(k=1;k<=n;k++) } x=x; Tparallelfor(i=1;i<=M;i++){ parallelfor(j=1;j<=m;j++) S if(p(i;j)){ xt[i,j]=0; } for(k=1;k<=n;k++) xs[i,j,k]=if(k==1)xt[i,j]; elsexs[i,j,k-1]; Figure17.a.Programmed'origineFigure17.b.Miseenassignationunique R} =(fhs;i;1;ni;:::;hs;i;m;nig);...figure17.exempledeparallelisation... doncleprogrammeenassignationunique.leresultatdel'analysededenitionsvisibles boucleexterne.lesdependancessurxinterdisenttouteexecutionparallele,ontransforme positifetquelepredicatp(i;j)estvraiaumoinsunefoispourchaqueiterationdela Nousetudionslepseudo-codedelagure17.a.NoussupposonsqueNeststrictement estexactpourlesinstancesdes,maispaspourcellesder:unefonctionestnecessaire. Lesdeuxbouclesexternesdeviennentalorsparalleles,commelemontrelagure17.b. dereduirel'occupationenmemoire.l'applicationdel'algorithmedelasectionv.3montre l'executionsequentielle(sursgiorigin2000avec32processeurs).ilestdoncnecessaire observequel'executionenparalleledeceprogrammeestenvironcinqfoispluslenteque Enraisondecettefonctionetdel'utilisationd'untableauatroisdimensions,on magedexenxsetxt.onobtientlecodedelagure18.a.onaimplementelafonction quel'expansionselonlabouclelaplusinternen'estpasnecessaire,pasplusquelerenom- cacheunesynchronisation.lesperformancessontdonccorrectespourunpetitnombre avecunetechniqueoptimiseedecalculalavolee(voirsectionv.1)etlecalculdumax delafonction,eninterdisantl'expansionselonlaboucleintermediaire,voirgure18.b; seulelaboucleexterneresteparallele.leprogrammeparallelesurunprocesseurestenvirondeuxfoispluslentqueleprogrammesequentiel(probablementenraisondesacces L'applicationdel'algorithmed'expansionstatiquemaximalepermetdesedebarrasser deprocesseurs,maissedegradenttresrapidementaudeladequatre. autableauadeuxdimensions),maisl'accelerationestexcellente.onobservequelavariablexaeteanouveauexpanseeselonlaboucleinterne,bienquecelan'apporteaucun parallelismesupplementaire:ilestdoncnecessairedecombinerlesdeuxtechniquesd'expansion.leresultatesttresprochedel'expansionstatiquemaximaleavecunedimension demoinspourletableaux:x[i]aulieudex[i,].bienentendu,lesperformances V.5 sontexcellentes:l'accelerationestde31;5sur32processeurs(m=64etn=2048). avoirlejour,gr^aceauxenvironnementsetauxoutils commecilk[mf98] facilitant Destechniquesdeparallelisationautomatiquepourprogrammesrecursifscommencent Parallelisationdeprogrammesrecursifs sonsunetechniquedemiseenassignationuniqueetunetechniquedeprivatisationpour l'implementationecacedeprogrammesaparallelismedecontr^ole[rr99].nouspropo-
V.EXPANSIONETPARALLELISATION... doublex[m+1,m+1]; 47 int@x[m+1]; parallelfor(i=1;i<=m;i++){ T@x[i]=?; parallelfor(j=1;j<=m;j++) S if(p(i;j)){ for(k=1;k<=n;k++) x[i,j]=0; doublex[m+1,n+1]; R=x[i,@x[i]]; }@x[i]=max(@x[i],j); x[i,j]=x[i,j]; Tparallelfor(i=1;i<=M;i++){ for(j=1;j<=m;j++) if(p(i;j)){ x[i,0]=0; pationenmemoire Figure18.a.Optimisationdel'occu- } SR } for(k=1;k<=n;k++) } =x[i,n]; x[i,k]=x[i,k-1];...figure18.deuxparallelisationsdierentes... Figure18.b.Expansionstatiquemaximale programmerecursifs,puisnouspresentonsdeuxmethodesdegenerationdecodeparallele. Expansiondeprogrammesrecursifs L'allocationdynamiqueetl'accesacesstructuresestdoncplusdelicatquedanslecasdes nidsdeboucles.l'ideegeneraleestdeconstruirechaquestructureexpanseedexp(ala ralementunestructured'arbre:seselementsontenbijectionaveclesmotsdecontr^ole. Dansunprogrammerecursifenassignationunique,lesstructuresexpanseesontgene- necessairepourlamiseajourdesreferencesenlecture:ondoittoutd'abordcalculerles lescellulesmemoireassocieesdansdexp.m^emeenl'absencedefonction,larestauration volee),enpropageantunpointeursurlenudcourant.l'accesdirectadexpesttoutefois denitionsvisiblespossiblesal'aidedutransducteurfourniparl'analyse,puisretrouver tionpartiellederdansw.lorsquecettefonctionpeut^etrecalculee(alavolee),il duotdesdonneesrisquedoncd'^etretresco^uteuse. panse:ilsutd'implementerlecalculpasapasdutransducteur.c'estnotammentlecas estpossibledegenereruncodeecacepourlesreferencesenlectureduprogrammeex- Silesdenitionsvisiblessontconnuesexactement,peut^etrevuecommeunefonc- avonstoutefoisproposeunalgorithmedemiseenassignationuniquepourprogrammes manipuleunestructured'arbre.enpresencedetableaux,ilestplusdiciledesavoirsi letransducteurauncompteurdesdenitionsvisiblesestcalculable(alavolee).nous pourlestransducteurssous-sequentiels(voirsectioniii.2),lorsqueleprogrammerecursif atransformerlesstructuresdedonneesglobalesenvariableslocales.danslecasgeneral, unecopiedesdonneesdoit^etreeectueelorsdechaqueappeletdechaqueretourd'une recursif,incluantlecalculalavoleedesdenitionsvisibleslorsquecelaestpossible. procedure.cecipeutserevelerco^uteuxlorsdelacopiedesstructureslocalesdansles Nousavonsetendulanotiondeprivatisationauxprogrammesrecursifs:elleconsiste tionsinevitablesencasd'executionparallele.toutefois,lorsquelesdenitionsvisiblessont obligatoirementdesanc^etres,seulelapremierephasedecopie(lecopy-in)estnecessaire; structuresdelaprocedureappelante(lecopy-out),notammentacausedessynchronisa-
48 c'estlecasduprogrammequeens,delaplupartdesalgorithmesdetri,etplusgeneralementdesschemasd'executiondutypediviserpourregnerouprogrammationdynamique. Nousproposonsdoncunalgorithmedeprivatisationpourprogrammerecursifs,oules Generationdecodeparallele PRESENTATIONENFRANCAIS fonctionssontremplaceespardescopiesdestructuresdedonnees. PvoidQueens(intA[n],intn,intk){... inta[n]; intb[n]; B=b A=a I if(k<n){ memcpy(b,a,k*sizeof(int)); r for(inti=0;i<n;i++){ J for(intj=0;j<k;j++){ s if(){ =B[j]; Q B[k]=; }}} spawnqueens(b,n,k+1); F}intmain(){ Queens(A,n,0);...Figure19.PrivatisationetparallelisationduprogrammeQueens... gorithmedeparallelisationauniveaudesinstructionsquipermetd'executercertaines gebriquespermettentderealiserdestestsdedependanceecaces.onendeduitunal- instructionsdemaniereasynchroneetquiintroduitdessynchronisationslorsquelesde- Nousmontronsquelesproprietesdedecidabilitedestransducteursrationnelsetal- tableaux,etdansunemoindremesureal'ordonnanceurdecilk[mf98]. SGIOrigin2000pourn=13.Leralentissementsurunprocesseurestd^uauxcopiesde pendancesl'exigent.cetalgorithmeestappliqueauprogrammebst,ainsiqu'aupro- grammequeensapresprivatisation,voirgure19.l'experimentationaetefaitesurune resultatsquelestechniquesexistantes,lorsqueladecouvertedeparallelismenecessiteune informationauniveaudesinstances.enn,nousetudionslaparallelisationparinstances deprogrammesrecursifs,oulessynchronisationssontgardeesparlesconditionsprecises Nousmontronsegalementquenotrealgorithmedeparallelisationdonnedemeilleurs quenousproposonsexploitepleinementleresultatdel'analysededependancesparinstances,etlapossibilitedetesterecacementsiuncoupledemotsestreconnuparun transducteur.unexempleconcretpermetdevalidercettenouvelletechnique. surlemotdecontr^ole pourlesquellesunedependanceestpossible.l'algorithme Optimal 13-Queens 16 } 8 4 2 1 0.5 1 2 4 8 16 32 Processors Speed-up (parallel / original) 32
VI.CONCLUSION Conclusion 49 discussionsurlesdeveloppementsavenir. Cettetheseseconclutparunerecapitulationdesprincipauxresultats,suivied'une troispremieresconcernentlaparallelisationautomatiqueetsontresumeesdansletableau VI.1 Noscontributionsserepartissentenquatrecategoriesfortementinterdependantes.Les Contributions suivant;laquatriemecategorieconcernelestransductionsrationnellesetalgebriques. Analysededependances [Bra88,Ban88] surtableaux Nidsaffines [BCF97,Bar98] Nidsgeneraux surtableaux surarbresettableaux [Fea98]1,sectionIV, Programmesrecursifs analysededefinitions[fea88a,fea91,pug92] visiblesparinstances parinstances[fea88a,fea91,pug92] [CBF95,BCF97,Bar98] [WP95,Won95] publiedans[cc98]2 Miseen [Fea88a,Fea91] [MAL93] [WP95,Won95] [Col98], publiedans[cc98]2 sectionv.5 sectioniv, assignationunique statiquemaximale Expansion sectionsv.2etv.4, sectionsv.1etv.4 Optimisationde [LF98,Lef98] publiedans[bcc98,coh99b,bcc00] sectionsv.3etv.4, problemeouvert problemeouvert l'occupationmemoire Parallelisation [SCFS98,CDRV97] [Fea92,CFH95] publiedans[cl99,coh99b] Apresent,passonsenrevuechaquecontribution. parinstances [DV97] [GC95,CBF95] [Col95b] sectionv.5 aeteutilisetoutaulongdecetravailpourformaliserlapresentationdenostechniques, pourlesinstancesd'instructionsetleselementsdestructuresdedonnees.cecadregeneral Structuresdecontr^oleetdedonnees:audeladumodelepolyedriqueDansla enparticulierdanslecasdesstructuresrecursives. sectionii,nousavonsdeniunmodeledeprogrammesetdesabstractionsmathematiques desvariablesd'inductionadapteeauxprogrammesrecursifsapermisdedecrirel'eetde plusprecisementdestransductionsrationnellesetalgebriques.unenouvelledenition seesdanslasectioniv.ellesutilisentunformalismedelatheoriedeslangagesformels, Denouvellesanalysesdedependancesetdededenitionsvisiblesontetepropo- avecd'autresanalysesconclutcetravail. chaqueinstanceal'aided'unetransductionrationnelleoualgebrique.unecomparaison tableaux uncasparticulierdenotremodele noussommesrestesdelesauxvecteurs 2.Pourlestableauxuniquement. 1.Ils'agitd'untestdedependancespourlesarbresuniquement. Enrevanche,lorsquenousavonsconcudesalgorithmespourlesnidsdebouclessur
50 d'iterationetnousavonsprotedelaquantited'algorithmespermettantlamanipulation PRESENTATIONENFRANCAIS derelationsanesdansl'arithmetiquedepresburger. etenduesauxprogrammesavecdesexpressionsconditionnelles,avecdesreferencescomplexesauxstructuresdedonnees parexempledesindexdetableauxnonanes oniqueancienne,maislesanalysesdedenitionsvisiblesparinstancessesontrecemment problemesl'applicationdel'expansiondelamemoirealaparallelisationestunetech- Expansiondelamemoire:denouvellestechniquespourresoudredenouveaux avecdesappelsrecursifs,etcelaposedenouvellesquestions.lapremiereestdegarantir quelesaccesenlecturedansleprogrammeexpansereferentlabonnecellulememoire;la modelesdeprogrammes. deuxiemequestionresidedansl'adequationdestechniquesd'expansionaveclesnouveaux techniquepourreduirelesurco^utdel'expansional'execution,etnousavonsetenduaux nidsdebouclessansrestrictionsunemethodedereductiondel'occupationenmemoire. lesnidsdeboucles(sansrestrictions)surtableaux.nousavonspresenteunenouvelle LesdeuxquestionssonttraiteesdanslessectionsV.1,V.2,V.3etV.4,danspour presentespourunearchitectureamemoirepartagee. larestaurationduotdesdonneesal'execution.quelquesresultatsexperimentauxsont Lacombinaisondesdeuxaeteetudieeetnousavonsconcudesalgorithmespouroptimiser totalementnouveau,etnousavonsdecouvertquel'abstractionmathematiquepourles denitionsvisibles lestransductionsrationnellesoualgebriques peuventengendrer dessurco^utsimportants.nousavonsneanmoinsdeveloppedesalgorithmesquiexpansent L'expansiondelamemoirepourprogrammesrecursifsestundomainederecherche aetemiseaprotpourparalleliserdesprogrammesrecursifs.nousavonspudemontrer Parallelisme:extensiondestechniquesclassiquesNotreanalysededependance desprogrammesrecursifsparticuliersavecunfaiblesurco^utal'execution. lesapplicationspratiquesdestransductionsrationnellesetalgebriques,enutilisantleurs programmesrecursifs:cettenouvelletechniqueestrenduepossibleparl'utilisationdes ilprotedel'informationpluspreciserecueillieparl'analyseetonobtientengeneral proprietesdecidables.notrepremieralgorithmeressembleauxmethodesexistantes,mais transductionsrationnellesetalgebriques.quelquesresultatsexperimentauxsontdecrits, demeilleursresultats.unautrealgorithmepermetlaparallelisationparinstancesde derniersresultatsdecetravailn'appartiennentpasaudomainedelacompilation.ilsse Theoriedeslangagesformels:quelquescontributionsetdesapplicationsLes encombinantexpansionetparallelisationsurunprogrammerecursifbienconnu. cetteclassen'estpasdecidableparmilestransductionsrationnelles,maisdestechniques gebrebooleeneetdenombreusesautresproprietesinteressantes.nousavonsmontreque avonsdeniunesous-classedestransductionsrationnellesquiadmetunestructured'al- trouventprincipalementdanslasectioniii.3ainsiquedanslessectionssuivantes.nous d'approximationconservatricespermettentdebenecierdecesproprietesdanslaclasse destransductionsrationnellestoutentiere.nousavonsegalementpresentequelquesnouveauxresultatssurlacompositiondetransductionsrationnellessurdesmonodesnon libres,avantd'etudierl'approximationdetransductionsalgebriques.
VI.CONCLUSION 51 VI.2 Perspectives Denombreusesquestionssesontposeestoutaulongdecettethese,etnosresultats suggerentplusderecherchesinteressantesqu'ilsneresolventdeproblemes.nouscommenconsparaborderlesquestionslieesauxprogrammesrecursifs,puisnousdiscutons destravauxfutursdanslemodelepolyedrique. Enpremierlieu,larecherched'uneabstractionmathematiquecapablededecriredes proprietesauniveaudesinstancesappara^tdenouveaucommeunenjeucapital.les transductionsrationnellesetalgebriquesontsouventdonnedebonsresultats,maisleur expressivitelimiteeaegalementrestreintleurchampd'application.c'estl'analysede denitionsvisiblesquienaleplussouert,ainsiquel'integrationdesexpressionsconditionnellesetdesbornesdebouclesdansl'analysededependances.danscesconditions, nousaurionsbesoindeplusd'uncompteurdanslestransducteurs,toutenconservantla possibilitedesavoirsiunensembleestvideetdedeciderd'autresproprietesinteressantes. NoussommesdoncfortementinteressesparlestravauxdeComonetJurski[CJ98]sur ladecisiondelavacuitedansunesous-classedeslangagesaplusieurscompteurs,etplus generalementnousvoudrionssuivredepluspreslesetudessurlavericationdesystemes fondeessurdesclassesrestreintesdemachinesdeminsky,commelesautomatestemporises.l'utilisationdeplusieurscompteurspermettraitenplusd'etendrel'unedesgrandes ideesdel'analyseouedeotdesdonnees[cbf95]:l'insertiondenouveauxparametres pourameliorerlaprecisionendecrivantlesproprietesdesexpressionsnonanes. Deplus,nouspensonsquelesproprietesdedecidabilitenesontpasforcementlepoint leplusimportantpourlechoixd'uneabstractionmathematique:debonnesapproximationssurlesresultatssontsouventsusantes.enparticulier,nousavonsdecouvert enetudiantlesrelationssynchronesagaucheetlesrelationsdeterministesqu'unesousclasseavecdebonnesproprietesdedecisionnepeutpas^etreutiliseedansnotrecadre generald'analysesansmethodeecaced'approximation.l'ameliorationdenosmethodes deresynchronisationetd'approximationdetransducteursrationnelsestdoncunenjeu important.nousesperonsaussiquececidemontrel'inter^etmutueldescooperationsentre theoriciensetchercheursencompilation. Audeladecesproblemesdeformalisme,uneautrevoiederechercheconsisteadiminuerautantquepossiblelesrestrictionsimposeesaumodeledeprogramme.Commeonl'a proposeprecedemment,lameilleuremethodeconsistearechercherunedegradationprogressivedesresultatsal'aidedetechniquesd'approximation.cetteideeaeteetudieedans uncontextesemblable[cbf95],etl'applicationauxprogrammesrecursifsprometdestravauxfutursinteressants.uneautreideeseraitdecalculerlesvariablesd'inductionapartir destracesd'execution(aulieudesmotsdecontr^ole) pourautoriserlesmodications dansn'importequelleinstruction puisdededuiredesinformationsapproximativessur lesmotsdecontr^ole;l'utilisationdetechniquesd'interpretationabstraite[cc77]serait probablementuneaideprecieusepourprouverlacorrectiondenosapproximations. Nousn'avonspastravaillesurleproblemedel'ordonnancementdesprogrammesrecursifs,carnousneconnaissonsaucunemethodepermettantd'assignerdesensembles d'instancesadesdatesd'execution.laconstructiond'untransducteurrationneldesdates auxinstancesestpeut^etreunebonneidee,maislagenerationdecodepourenumererles ensemblesd'instancesdevientplut^otdicile.maiscesraisonstechniquesnedoiventpas cacherquel'essentielduparallelismedanslesprogrammesrecursifspeutd'oresetdeja ^etreexploitepardestechniquesaparallelismedecontr^ole,etlanecessitederecouriraun modeled'executionaparallelismededonneesn'estpasevidente. Enplusdeleurincidencesurnotreetudedesprogrammesrecursifs,lestechniques
52 issuesdumodelepolyedriquerecouvrentunepartieimportantedecettethese.unobjectifmajeurtoutaulongdecestravauxaetedeconserverunecertainedistanceavecla PRESENTATIONENFRANCAIS ilasurtoutl'avantagedepresenternotreapprochedanstoutesageneralite.parmilesproblemestechniquesquidevraient^etreameliores,tantpourl'expansionstatiquemaximale pasfaciliterl'ecritured'algorithmesoptimisespr^etsal'emploidansuncompilateur,mais representationmathematiquedesrelationsanes.cepointdevueal'inconvenientdene otdesdonnees,maisnousavonstrespeud'experiencepratiquedelaparallelisationde nidsdebouclesavecunotdecontr^oleimprevisibleetdesindexdetableauxnonanes. etpourl'optimisationdel'occupationenmemoire,lesplusimportantssontlessuivants. CommeleformalismeSSA[CFR+91]estprincipalementutiliseentantquerepresentation Nousavonspresentedenombreuxalgorithmespourlarestaurationdynamiquedu intermediaire,lesfonctionssontrarementimplementeesenpratique.lagenerationd'un codederestaurationecaceestdoncunproblemeplut^otrecent. resultequ'uneexperimentationdegrandeampleurn'ajamaispu^etreconduite.pourappliquerdesanalysesetdestransformationsprecisessurdesprogrammesreels,unimportant travaild'optimisationresteaconduire.lesideesprincipalesseraientdepartitionnerle Aucunparalleliseurpournidsdebouclessansrestrictionsn'ajamaiseteecrit.Ilen regionsdetableaux[cre96]ouauxordonnancementshierarchiques[cw99]. code[ber93]etd'etendrenostechniquesauxgraphesdedependancehierarchiques,aux enmemoire,leplacementdescalculsetdescommunications...nousavonsvuquele problemed'optimisationestencorepluscomplexepourdesnidsdebouclesnonanes.le nombredeparametres:lesurco^utal'execution,l'extractionduparallelisme,l'occupation Uncompilateurparallelisantdoit^etrecapabledereglerautomatiquementungrand deparametresliesal'expansiondelamemoire,maisilnes'agitqued'unpremierpas. formalismed'expansioncontraintepermetd'optimisersimultanementuncertainnombre
53 Chapter1 Introduction factors:fastincreaseofprocessorfrequency,broaderbuswidths,increasednumberof Performanceincreaseincomputerarchitecturetechnologyisthecombinedresultofseveral becominglessandlessuniformandsimple:despitethehardwaresupportforcaches, withhighlatencies,andglobalincreaseofstoragecapacities.newimprovementsand functionalunits,increasednumberofprocessors,complexmemoryhierarchiestodeal performancebecomesmoreandmorecomplex.goodoptimizationsforsomeparticular superscalarexecutionandsharedmemorymultiprocessing,tuningagivenprogramfor architecturaldesignsareproposedeveryday.theresultisthatthemachinemodelis casecanleadtodisastrousresultswithadierentmachine.moreover,hardwaresupport totranslaterawcomputationpowerintosustainedperformance.therecentshiftof parallelismandcoarsegrainparallelismrequiresadditionalsupportfromthecompiler withdeepmemoryhierarchies,localmemories,outofcorecomputations,instructionlevel isgenerallynotsucientwhenthecomplexityofthesystembecomestoohigh:dealing microprocessortechnologyfromsuperscalarmodelstoexplicitinstructionlevelparallelism isoneofthemostconcretesignsofthistrend. andformostapplications,architecturesaretoodiversetodenepracticaleciencycriteriaandtodevelopspecicoptimizationsforaparticularmachine.onthesecondhand, Indeed,thewholeofcomputerarchitectureandcompilerindustryisnowfacingwhat thehighperformancecomputingcommunityhasdiscoveredforyears.ontheonehand, tomorrowinhislaptop. programsarewritteninsuchawaythattraditionaloptimizationandparallelizationtechniqueshavemanyproblemstofeedthehugecomputationmonstereverybodywillhavputers,aprogram oratleastthealgorithmitimplements mustcontainasignicant degreeofparallelism.eventhen,eithertheprogrammerand/orthecompilerhastoexposethisparallelismandapplythenecessaryoptimizationstoadaptittotheparticular Inordertoachievehighperformancesonmodernmicroprocessorsandparallelcom- tocopewiththefastobsolescenceofparallelmachines.thefollowingtwopossibilities areoeredtotheprogrammertomeettheserequirements. characteristicsofthetargetmachine.moreover,theprogramshouldbeportableinorder First,explicitlyparallellanguages.Mostoftheseareparallelextensionsofsequentiallanguages.ThisincludeswellknowndataparallellanguagessuchasHPF,and sharedmemoryarchitectures.someextensionsalsoappearundertheformoflibraries:pvmandmpiforinstance,orhigher-levelmulti-threadedenvironments suchasimlfromtheuniversityofillinois[ssp99]orcilkfromthemit[mf98]. recentmixeddataandcontrolparallelapproachessuchasopenmpextensionsfor
54 Theseapproachesmakestheprogrammingofhighperformanceparallelalgorithms CHAPTER1.INTRODUCTION possible.however,besidesparallelalgorithmics,theprogrammerisalsoincharge ofmoretechnicalandmachine-dependentoperations,suchasthedistributionof ducesportability.severaleortshavebeendoneinhpfsoastomakethecompiler synchronizations.thisrequiresadeepknowledgeofthetargetarchitectureandre- takecareofsomepartsofthisjob,butitseemsthattheprogrammerstillneedsto dataontheprocessorsdependingontheirmemorycapacities,communicationsand Second,automaticparallelizationofahighlevelsequentiallanguage.Theobviousadvantagesofthisapproacharetheportability,thesimplicityofprogramming parallelized(intheory).howeverthetaskallotedtothecompiler-parallelizerisoverwhelming.indeed,theprogramhasrsttobeanalyzedinordertounderstand at leastpartially whatisperformedandwheretheparallelismlies.thecompilerthen andthefactthatevenoldundocumentedsequentialcodesmaybeautomatically haveapreciseknowledgeofwhatthecompilerdoes. hastotakesomedecisionsabouthowtogenerateaparallelcodewhichtakesinto accountthespecicitiesofthetargetarchitecture.evenforshortprogramsanda simpliedmodelofparallelmachine,\optimality"inbothstepsisoutofreachfor TheusualsourcelanguagesforautomaticparallelizationisFortran77.Indeed, exists,andthedicultyoftenliesinchoosingthemoreappropriate. decidabilityreasons.asamatteroffact,awidepanelofparallelizationtechniques Thesestudiesarelessadvancedthanthehistoricalapproach,butalsomorerelated howeverdealwiththeparallelizationofcoroffunctionallanguagessuchaslisp. manyscienticapplicationshavebeenwrittenwithfortran,whichallowsonlyrelativelysimpledatastructures(scalarandarrays)andcontrolow.severalstudies withthepresentwork:theyhandleprogramswithgeneralcontrolanddatastructures.manyresearchprojectsalreadyexist,amongothers:parafrase-2andpolaris [BEF+96]fromtheUniversityofIllinois,PIPSfromEcoledesMines[IJT90],SUIF lelizingtools,suchascft,forge,foresysorkap. UniversityofVersailles;therearealsoanincreasingnumberofcommercialparalversity[HTZ+97],LooPofromtheUniversityofPassau[GL97],andPAFfromthe fromstanforduniversity[h+96],themccat/earth-ccompilerfrommcgilluni- thisthesisaddressesbothprogramanalysisandsourcetosourceprogramtransformation. Wearemostlyinterestedinautomaticandsemi-automaticparallelizationtechniques: tionswhichimprovesoneorseveralrun-timeparameters.toapplyaprogramtransfor- mationatcompile-time,onemustcheckthatthealgorithmimplementedbytheprogram Optimizationsandparallelizationsareusuallyseenassourcetosourcecodetransforma- 1.1 ProgramAnalysis isunharmedduringtheprocess.becauseanalgorithmcanbeimplementedinmanydifferentways,applyingaprogramtransformationrequires\reverseengineering"themost preciseinformationaboutwhattheprogramdoes.thisfundamentalprogramanalysistechniqueaddressesthedicultproblemofgatheringcompile-time a.k.a.static informationaboutrun-time a.k.a.dynamic properties.
1.1.PROGRAMANALYSIS StaticAnalysis 55 arecalledstaticbecausetheycovereverypossiblerun-timeexecutionleadingtoagiven twoinstructions.thesemachinestatesareknownasprogrampoints.suchproperties programpoint.ofcoursethesepropertiesarecomputedatcompile-time,butthisisnot Programanalysesoftencomputepropertiesofthemachinestatebetweenexecutionof Muc97,ASU86,JM82,KS92,SRH96],onemayexposethefollowingcommonissues.To themeaningofthe\static"adjective:\syntactic"wouldprobablybemoreappropriate... formallystatethepossiblerun-timeexecutions,theusualmethodistobuildthecontrol analyses.amongthevariouswordingsandformalpresentationsofthisframework[ku77, Data-owanalysisistherstproposedframeworktounifythelargenumberofstatic owgraphoftheprogram[asu86];indeed,thisgraphrepresentsallprogrampointsas allpossibleexecutionsisthenthesetofallpathsfromtheinitialstatetotheconsidered nodes,andedgesbetweenthesenodesarelabeledwithprogramstatements.thesetof programpointandmeetallinformationsalongthesepaths.theformalstatementofthese eachstatementmaymodifysomeproperty,onemustconsidereverypathleadingtothe ideasisusuallycalledmeetoverallpaths(mop)[ks92].ofcourse,themeetoperation programpoint.propertiesatagivenprogrampointaredenedasfollows:because dependsonthepropertytobeevaluatedandonitsmathematicalabstraction. oftheproblemcannotbeusedforpracticalevaluationofstaticproperties.practical alongedgesofthecontrolowgraph.aniterativeresolutionofthepropagationequations computationisdoneby forwardorbackward propagationoftheintermediateresults However,becauseofthepossiblyunboundednumberofpaths,theMOPspecication (MFP).Intheintra-proceduralcase,KamandUllman[KU77]haveproventhatMFP isperformed,untilax-pointisreached.thismethodisknownasmaximalxedpoint somesimplepropertiesofthemathematicalabstractionaresatised;andthisresulthas beenextendedtointer-proceduralanalysisbyknoopandsteen[ks92]. eectivelycomputestheresultdenedbymop i.e.mfpcoincideswithmop when theapplicationandcomplexityoftheanalysis.thelatticestructureencompassesmostabstractionsbecauseitsupportscomputationofbothmeet atmergepoints andjoin at Mathematicalabstractionsforprogrampropertiesareverynumerous,dependingon computationalstatements operations.inthiscontext,cousotandcousot[cc77]have ematicalformulationcalledabstractinterpretationhastwomaininterests:rstitallows systematicapproachestotheconstructionofalatticeabstractionforprogramproperties, concreterun-timestatesofaprogramandabstractcompile-timeproperties.thismath- proposedanapproximationframeworkbasedonsemi-dualgaloisconnectionsbetween Whileextendingtheconceptofdata-owanalysis,abstractinterpretationhelpsproving thecorrectnessandoptimalityofprogramanalyses.practicalapplicationsofabstractinterpretationandrelatediterativemethodscanbefoundin[cou81,ch78,deu92,cre96]works,theautomaticparallelizationcommunityhasveryrarelybaseditsanalysistechniquesononeoftheseframeworks.beyondtheimportantreasonswhicharenotofa scienticnature,wewilldiscussthegoodreasons: Despitetheundisputablesuccessesofdata-owandabstractinterpretationframe- toaconservativeapproximationofanactualx-pointinthelatticeofconcretestates. andsecond,itensuresthatanycomputedx-pointintheabstractlatticecorresponds MOP/MFPtechniquesfocusonclassicaloptimizationstechniques,withrathersimpleabstractions(latticesoftenhaveaboundedheight);correctnessandeciencyin aproductioncompilerarethemainmotivations,whereasprecisionandexpressive-
56 nessofthemathematicalabstractionarethemainissuesforparallelization; CHAPTER1.INTRODUCTION intheindustry,parallelizationhastraditionallyaddressednestsofloopsandarrays, issuesofcriticalinterest; applicationstorealprogramsandpracticalimplementationinacompilerbecome withhighdegreesofdataparallelismandsimple(nonrecursive,rstorder)control structures;provingthecorrectnessofananalysisiseasyinthiscontext,whereas abstractinterpretationiswellsuitedtofunctionallanguageswithcleanandsimple issuesofimperativeandlow-levellanguagessuchasfortranorc,traditionallymore operationalsemantics;problemsraisedinthiscontextareorthogonalwithpractical staticanalysistechniques,whichcomputepropertiesatagivenprogrampointorstatement.suchresultsarewellsuitedtomostclassicaltechniquesforprogramcheckingand oneneedsmoreinformation. Whataboutdistinctrun-timeinstancesofprogrampointsandstatements?Because Asaresult,data-owandabstractinterpretationframeworkshavemostlyfocusedon suitableforparallelarchitectures(butwewillseethatthispointisevolving). optimization[muc97,asu86,skr90,krs94],butforautomaticparallelizationpurposes, Whataboutdistinctelementsinadatastructure?Becausearraysanddynamically statementsarelikelytoexecuteseveraltimes,weareinterestedinwhichiteration ofalooporwhichcalltoaprocedureinducedexecutionofsomeprogramstatement. allocatedstructuresarenotatomic,weareinterestedinwhicharrayelementor lelizationcommunities,itisnotsurprisingthatresultsoftheonescouldnotbeappliedby Becauseoforthogonalinterestsinthedata-owanalysisandtheautomaticparal- whichgraphnodeisaccessedbysomerun-timeinstanceofastatement. theothers.indeed,averysmallnumberofdata-owanalyses[dgs93,tzo97]addressed InstancewiseAnalysis bothinstancewiseandelementwiseissues,butresultsareveryfarfromtherequirements ofacompilerintermsofprecisionandapplicability. Theprogrammodelconsideredisalsomorerestricted mostofthetime sincetraditional applicationsofparallelizingcompilersarenumericalcodeswithloopnestsandarrays. tothebroadrangeofpropertiesandtechniquesstudiedindata-owanalysisframeworks. Programanalysesforautomaticparallelizationarearatherrestricteddomain,compared Feautrier[Fea88a] analysesareorientedtowardsinstancewiseandelementwisepropertiesofprograms.whentheonlycontrolstructurewasthefor/doloop,iterativemethods withahighsemanticalbackgroundseemedoverlycomplex.tofocusonsolvingcritical Sincetheverybeginning withworksbybanerjee[ban88],brandes[bra88]and problemssuchasabstractingloopiterationsandeectsofstatementinstancesonarray elements,designingsimpleandad-hocframeworkswasobviouslymoreprotablethan statementinstanceswhichaccessthesamememorylocation,oneoftheaccessesbeinga write.moreprecisemethodshavebeendesignedtocompute,foreveryarrayelementread tests[ban88]anddependenceanalyses[bra88,pug92]whichcollectedinformationabout tryingtobuildonunpracticaldata-owframeworks.therstanalysesweredependence inanexpression,theverystatementinstancewhichproducedthevalue.theyareusually calledarraydata-owanalyses[fea91,mal93],butweprefertocalltheminstancewise
1.2.PROGRAMTRANSFORMATIONSFORPARALLELIZATION reachingdenitionanalysesforbettercomparisonwithaspecicstaticdata-owanalysis 57 techniquecalledreachingdenitionanalysis[asu86,muc97].suchaccurateinformationsignicantlyimprovesthequalityofprogramtransformationtechniques,hencethe subscripts,andwithoutprocedurecalls.thisverylimitedmodelisalreadysucient usedtobenestedloopswithoutconditionalstatements,withaneboundsandarray performanceofparallelprograms. toaddressmanynumericalcodes,andhasthemajorinterestofallowingcomputation Instancewiseanalyseshavelongsueredstrongprogrammodelrestrictions:programs conservativeapproximationsofreachingdenitioninformation.adirectcomputationof dicultiesinremovingtherestrictionsisthatexactresultscannotbehopedforanymore, andonlyapproximatedependencesareavailableatcompile-time:thisinducesoverly ofexactdependenceandreachingdenitioninformation[fea88a,fea91].oneofthe andfeautrier[cbf95,bcf97,bar98]andbypughandwonnacott[wp95,won95].in andextremelypreciseintra-proceduraltechniqueshavebeendesignedbybarthou,collard thefollowing,fuzzyarraydataowanalysis(fada)bybarthou,collardandfeautrier reachingdenitionsisthusneeded.recently,suchdirectcomputationshavebeencrafted, CI96],buttheyarenotfullyinstancewiseinthesensethattheydonotdistinguishbe- [Bar98]willbeourpreferedinstancewisereachingdenitionanalysisforprogramswith unrestrictednestedloopsandarrays. tweenmultipleexecutionsofastatementassociatedwithdistinctcallsofthesurround- ingprocedure.indeed,therstfullyinstancewiseanalysisforprogramswith possibly Manyextensionstohandleprocedurecallshavebeenproposed[TFJ86,HBCM94, ofthesetransformationswillbestudiedinmoredetailintherestofthisthesis.ofcourse, recursive procedurecallsispresentedinthisthesis. theyarebasedoninstancewiseandelementwiseanalysisofprogramproperties. Thenextsectionintroducesprogramtransformationsusefultoparallelization.Most cientcompilationonmodernprocessorsorsupercomputers.ageneralmethodtoreduce Dependencesareknowntohamperparallelizationofimperativeprogramsandtheire- 1.2 ProgramTransformationsforParallelization cessesinprograms.classicalwaysincluderenamingscalars,arraysandpointers,splitting ingdistinctmemorylocationstoindependentwrites,i.e.toexpanddatastructures. thenumberofmemory-baseddependencesistodisambiguatememoryaccessesinassign- newdimensions,convertingarraysintotrees,changingthedegreeofatree,andchanging ormergingdatastructuresofthesametype,reshapingarraydimensions,includingadding Therearemanywaystocomputememoryexpansions,i.e.totransformmemoryac- toimplementtheexpandedreference[fea91].figure1.1showsthreeprogramswithno aglobalvariableintoalocalone. possibleparallelexecutionbecauseofoutputdependences(detailsofthecodeareomitted whennotusefulforpresentation).expandedversionsaregivenintheright-handsideof Readreferencesarealsoexpanded,usinginstancewisereachingdenitioninformation thegure,toillustratethebenetofmemoryexpansionforparallelismextraction. similar butnotidentical tothoseofthestaticsingle-assignment(ssa)frameworkby to\merge"datadenitionsduetoseveralincomingcontrolpaths.thesefunctionsare timecomputationisneededtopreservetheoriginaldataow:functionsmaybeneeded Unfortunately,whenthecontrol-owcannotbepredictedatcompile-time,somerun- Cytronetal.[CFR+91],andhavebeenrstextendedforinstancewiseexpansionschemes
58 CHAPTER1.INTRODUCTION... intx; x=;=x; x=;=x; intx1,x2; x1=;=x1; x2=;=x2; Afterexpansion,i.e.renamingxinx1andx2,thersttwostatementscanbeexecuted inparallelwiththetwoothers. inta[10]; for(i=0;i<10;i++){ s1a[0]=; for(j=1;j<10;j++){ s2 A[j]=A[j-1]+; } inta1[10],a2[10][10]; for(i=0;i<10;i++){ s1a1[i]=; for(j=1;j<10;j++){ s2 A2[i][j]={if(j=1)A1[i]; elsea2[i][j-1];}+; } Afterexpansion,i.e.renamingarrayAinA1andA2thenaddingadimensiontoarray A2,theforloopisparallel.TheinstancewisereachingdenitionoftheA[j-1]reference dependsonthevaluesofiandj,asimplementedwithaconditionalexpression. inta[10]; voidproc(inti){ A[i]=; =A[i]; if()proc(i+1); if()proc(i-1); } structtree{ intvalue;tree*left,*right; }*p; voidproc(tree*p,inti){ p->value=; =p->value; if()proc(p->left,i+1); if()proc(p->right,i-1); } Afterexpansion,thetwoprocedurecallscanbeexecutedinparallel.Memoryallocation forthetreestructureisnotshown....figure1.1.simpleexamplesofmemoryexpansion... bycollardandgriebl[gc95,col98].theargumentofafunctionisthesetofpossible reachingdenitionsfortheassociatedreadreference.1figure1.2showstwoprograms withsomeunknownconditionalexpressionsandarrayssubscripts.expandedversions withfunctionsaregivenintherightsideofthegure. Noticethatmemoryexpansionisnotamandatorystepforparallelization;itisyeta generaltechniquetoexposeparallelisminprograms.now,implementationofaparallel programdependsonthetargetlanguageandarchitecture.twomaintechniquesareused. Thersttechniquetakesbenetofcontrolparallelism,i.e.parallelismbetweendifferentstatementsinthesameprogramblock.Itsgoalistoreplaceasmanysequential executionsofstatements denotedwith;inc byparallelexecutions.dependingon thelanguage,therearemanydierentsyntaxestocodethiskindofparallelism,andall thesesyntaxesmaynothavethesameexpressivepower.wewillpreferthecilk[mf98] spawn/syncsyntax(similartoopenmp'ssyntax)totheparallelblocknotationfrom Algol68ortheEARTH-Ccompiler[HTZ+97].Asin[MF98],synchronizationsinvolve 1ThisinterpretationoffunctionsisverydierentfromtheirusualsemanticsintheSSAframework.
1.2.PROGRAMTRANSFORMATIONSFORPARALLELIZATION 59... intx; s1x=; s2if()x=; r=x; intx1,x2; s1x1=; s2if()x2=; r=(fs1;s2g); Afterexpansion,onemaynotdecideatcompile-timewhatvalueisreadbystatement r.oneonlyknowsthatitmayeithercomefroms1orfroms2,andtheeectivevalue retrievalcodeishiddeninthe(fs1;s2g)function.itcheckswhethers2executedornot, thenifitdid,itreturnsthevalueofx2,elseitreturnsthevalueofx1. inta[10]; s1a[i]=; s2a[]=; r=a[i]; inta1[10],a2[10]; s1a1[i]=; s2a2[]=; r=(fs1;s2g); Afterexpansion,onemaynotdecideatcompile-timewhatvalueisreadbystatementr, becauseonedoesnotknowwhichelementofarrayaisassignedbystatements2....figure1.2.run-timerestorationoftheowofdata... everyasynchronouscomputationstartedinthesurroundingprogramblock,andimplicit synchronizationsareassumedatreturnpointsinprocedures.fortheexampleinfigure1.3.a,executionofa,b,cinparallelfollowedsequentiallybydandehasbeen writteninacilk-likesyntax(eachstatementwouldprobablybeaprocedurecall).... spawna; spawnb; spawnc; sync; //waitfora,bandctocomplete D; E; Figure1.3.a.Controlparallelism //Listhelatencyoftheschedule for(t=0;t<=l;t++){ parallelfor({2f(t)) executeinstance{ //implicitsynchronization }Figure1.3.b.Dataparallelimplementationfor schedules...figure1.3.exposingparallelism... Thesecondtechniqueisbasedondataparallelism,i.e.parallelismbetweendierent instancesofthesamestatementorblock.thedataparallelprogrammingmodelhas beenextensivelystudiedinthecaseofloopnests[pd96],becauseitisverywellsuited toecientparallelizationofnumericalalgorithmsandrepetitiveoperationsonlargedata sets.wewillconsiderasyntaxsimilartoopenmpparallelloopdeclaration,whereall variablesaresupposedtobesharedbydefault,andanimplicitsynchronizationtakes placeateachparallellooptermination. Therstalgorithmstogeneratedataparallelcodewerebasedonintuitivelooptransformationssuchasloopssion,loopfusion,loopinterchange,loopreversal,loopskewing, loopreindexingandstatementreordering.moreover,dependencesabstractionsweremuch lessexpressivethananerelations.butdataparallelismisalsoappropriatewhendescribingaparallelorderwithaschedule,i.e.givinganexecutiondateforeverystatement
instance.theprogrampatterninfigure1.3.bshowsthegeneralimplementationofsuch 60 CHAPTER1.INTRODUCTION manyothermethodshavebeendesigned.theseareallbasedonaratherapproximative aschedule[pd96].itisbasedontheconceptofexecutionfrontf(t)whichgathersall instances{executingatdatet. ofgenerality,thebenetofsuchmethodsisthelowcomplexityandeasyimplementation abstractionsofdependences,likedependencelevels,vectorsandcones.despitethelack TherstschedulingalgorithmwasdesignedbyAllenandKennedy[AK87],fromwhich inaindustrialparallelizaingcompiler;seetheworkbybanerjee[ban92]ormorerecently bydarteandvivien[dv97]forasurveyofthesealgorithms. (onadistributedmemorymachine),thewidthofthefronts? Theproposedalgorithmisveryuseful,butitsweakpointisthelackofhelptodecidewhat parameterofthescheduletooptimize:isitthelatencyl,thenumberofcommunications TherstgeneralsolutiontotheschedulingproblemwasproposedbyFeautrier[Fea92]. forwhichthedistinctionbetweenthetwoparadigmsbecomesveryunclear,asshownin lelism,meaningthateverydataparallelprogramcanberewritteninacontrolparallel model,withoutloosinganyparallelism.thisisespeciallytrueforrecursiveprograms, Eventually,itiswellknownthatcontrolparallelismismoregeneralthandataparal- thatarchitecturesformassivelyparallelcomputationsweremuchmoresuitedtodata trolparallelism mainlyduetoasynchronoustaskmanagementoverhead.butrecent [Fea98].However,forpracticalprogramsandarchitectures,ithaslongbeenthecase parallelism,andthatgettinggoodspeed-upsonsucharchitectureswasdicultwithcon- advancesinhardwareandsoftwaresystemsareshowinganevolutioninthissituation: algorithms)havebeenshownwithcilkforexample[mf98]. excellentresultsforparallelrecursiveprograms(gamesimulationslikechess,andsorting generalframeworkforprogramanalysisandtransformation,andpresentstheformaldenitionsusefultothefollowingchapters.themaininterestofthischapteristoencompass Thisthesisisorganizedinfourchaptersandanalconclusion.Chapter2describesa 1.3 ThesisOverview datastructures. averylargeclassofprograms,fromnestsofloopswitharraystorecursiveprogramsand andtheothersaremostlycontributions,suchasleft-synchronoustransductionsandapproximationtechniquesforrationalandalgebraictransductionstensionoftheinductionvariableconcepttorecursiveprogramsandonnewresultsin analysis.thesealgorithmsareappliedtoseveralpracticalexamples. Chapter4addressesinstancewiseanalysisofrecursiveprograms.Basedonanex- known,suchaspresburgerarithmetcisandformallanguagetheory;someareveryuncommonincompilerandparallelismelds,suchasrationalandalgebraictransductions; AcollectionofmathematicalresultsisgatheredinChapter3;someareratherwell formallanguagetheory,itpresentstwoalgorithmsfordependenceandreachingdenition tionals,boundsandarraysubscripts;thefourthsectionisacontributiontosimultaneous optimizationofexpansionandparallelizationparameters;andthefthsectionpresents rstthreesectionspresentnewtechniquestoexpandnestedloopswithunrestrictedcondi- ParallelizationtechniquesbasedonmemoryexpansionarestudiedinChapter5.The ourresultsaboutparallelizationofrecursiveprograms.
61 Chapter2 Framework andapproaches.eachonehasbeenstudiedbymanyauthorswhohavedenedtheir ownvocabularyandabstractions.ofcourse,wewouldliketokeepthesameformalism Thepreviousintroductionandmotivationhascoveredseveralverydierentconcepts alongthewholepresentation.thischapterpresentsaframeworkfordescribingprogram analysisandtransformationtechniquesandforprovingtheircorrectnessortheoretical properties.thedesignofthisframeworkhasbeengovernedbythreemajorgoals: 1.buildonwelldenedconceptsandvocabulary,whilekeepingthecontinuitywith 2.focusoninstancewisepropertiesofprograms,andtakebenetofthisadditional informationtodesignnewtransformationtechniques; relatedworks; 3.headforbothgeneralityandhighprecision,minimizingthenecessarynumberof rootedinsemanticallyandmathematicallysoundtheories[ku77,cc77,jm82,ks92]. Thispresentationdoesnotcompetewithotherformalisms,someofwhicharermly tradeos. theory.wearesurethatinstancewiseanalysiscanbemodeledinaformalframeworksuch Becauseweadvocateforinstancewiseanalysisandtransformations,weprimarilyfocused onestablishingconvincingresultsabouteectivenessandfeasibility.thisrequiredleaving asabstractinterpretation,evenifveryfewworkshaveaddressedthisimportantissue. forfurtherstudiesthenecessaryintegrationofourtechniquesinamoretraditionalanalysis executionsinsection2.1,thentheprogrammodelwewillconsiderthroughoutthisstudy isexposedandmotivatedinsection2.2.section2.3proposesmathematicalabstractions fortheseinstanceandprogrammodels.programanalysisandtransformationframeworks Westartwithaformalpresentationofrun-timestatementinstancesandprogram 2.1 areaddressedinsections2.4and2.5respectively. thesurroundingcontrolstructures(loops,procedurecallsandconditionalexpressions). Duringprogramexecution,eachstatementcanbeexecutedseveraltimes,dependingon GoingInstancewise Denition2.1(instance)Forastatements,arun-timeinstanceofsissomeparticular techniquesshouldbeabletodistinguishbetweenthedistinctexecutionsofastatement. Tocapturedata-owinformationaspreciselyaspossible,ouranalysisandtransformation executionofsduringexecutionoftheprogram.
62 CHAPTER2.FRAMEWORK Forshort,arun-timeinstanceofastatementiscalledaninstance.Iftheprogramterminates,eachstatementhasanitenumberofinstances. ConsiderthetwoexampleprogramsinFigure2.1.Theybothdisplaythesumofan arrayawithanunknownnumbernofelements;oneisimplementedwithaloopand theotherwitharecursiveprocedure.statementsbandcareexecutedntimesduring executionofeachprogram,butstatementsaanddareexecutedonlyonce.thevalue ofvariableicanbeusedto\name"eachinstanceofbandcandtodistinguishat compile-timebetweenthe2n+2run-timeinstancesofstatementsa,b,candd:the uniqueinstancesofstatementsaanddaredenotedrespectivelybyhaiandhci,andthe NinstancesofstatementB(resp.statementC)associatedwithsomevalueiofvariable iaredenotedbyhb;ii(resp.byhc;ii),0i<n.suchan\iterationvariable"notation isnotalwayspossible,andageneralnamingschemewillbestudiedinsection2.3.... inta[n]; intc; Ac=0; for(i=0;i<n;i++){ B c=c+a[i]; }printf("%d",c); inta[n]; intsum(inti){ if(i<n) C returna[i]+sum(i+1); else D return0; }printf("%d",sum(0));...figure2.1.aboutrun-timeinstancesandaccesses... Becauseofthestateofmemoryandpossibleinteractionswithitsenvironment,several executionsofthesameprogrammayyielddierentsetsofrun-timestatementinstances andincompatibleresults.wewillnotformallydenethisconceptofprogramexecution inoperationalsemantics:averycleanframeworkhasindeedbeendenedbycousot andcousot[cou81]forabstractinterpretation,butthecorrectnessofouranalysisand transformationtechniquesdoesnotrequiresomanydetails. Denition2.2(programexecution)LetPbeaprogram.Aprogramexecutioneis givenbyanexecutiontraceofp,whichisaniteorinnite(whentheprogramdoes notterminate)sequenceofcongurations i.e.machinestates.thesetofallpossible programexecutionsisdenotedbye. Now,thesetofallrun-timeinstancesforagivenprogramexecutione2Eisdenoted byie.subscriptedenotesagivenprogramexecution,butitalsorecallsthatsetie is\exact":itistheeectiveunapproximatesetofstatementinstancesexecutedduring programexecutione.thisformalismwillbeusedineveryfurtherdenitionofexecutiondependentconcept. ConsideringagainthetwoprogramsinFigure2.1,theexecutionofstatementsBandC isgovernedbyacomparisonofvariableiwiththeconstantn.withoutanyinformation onthepossiblevaluesofn,itisimpossibletodecideatcompile-timewhethersome instanceofborcexecutes.intheextremecaseofanexecutionewherenisequal tozero,bothstatementsareneverexecuted,andthesetieisequaltofhai;hdig.in general,ieisequaltofhai;hdig[fhb;ii;hc;ii:0i<ng,thevalueofnbeingpart ofthedenitionofe.
2.2.PROGRAMMODEL Ofcourse,eachstatementcaninvolveseveral(includingzero)memoryreferences,at 63 mostoneofthesebeingawrite(i.e.inleft-handside). Denition2.3(access)Apair({;r)ofastatementinstanceandareferenceinthe canbedecomposedinto: Foragivenexecutione2Eofaprogram,thesetofallaccessesisdenotedbyAe.It statementiscalledanaccess. andwe,thesetofallwrites,i.e.accessesperformingsomestoreoperationinto Re,thesetofallreads,i.e.accessesperformingsomeloadoperationfrommemory; write.sinceastatementperformingsomewriteinmemoryinvolvesexactlyonereference Duetooursyntacticalrestrictions,noaccessmaybesimultaneouslyareadanda memory. simpliestheexposition). inleft-handside,itsinstancesareoftenusedinplaceofitswriteaccesses(thissometimes statementahasonewritereferencetovariablec,thesingleassociatedaccessis LookingagainatourtwoprogramsinFigure2.1: statementbhasonewriteandonereadreferencetovariablec,sincebothreferences areidentical,theassociatedaccessesarebothdenotedbyhb;i;ci,0i<n; denotedbyha;ci; statementbhasonereadreferencetoarraya,theassociatedaccessesaredenoted statementchasonereadreferencetoarraya,theassociatedaccessesaredenoted byhb;i;a[i]i,0i<n; statementdhasnomemoryreference,thusnoassociatedaccess. byhc;i;a[i]i,0i<n; 2.2 Ourframeworkfocusesonimperativeprograms.Thissectiondescribesthecontroland datastructuresyntaxweconsider.inapreliminarywork[ccg96],wedenedatoy ProgramModel programmodelwithac-likesyntax(withc++syntacticsugarfacilities):despitethethe shapesttingourprogrammodel.mostoftheprogrammodelrestrictionsweenumerate inthissectionwerealsoenforcedbythelanguagesemantics.wechoseyettodeneour language calledlegs whichallowedexplicitdeclarationofcomplexdatastructures ofpracticalexamplesandthecommunicationofournewideas. lackofformalsemanticsavailableinc,wehopethischoicewilleasetheunderstanding Proceduresareseenasfunctionsreturningthevoidtypeandexplicit typed pointers 2.2.1 areallowed.multi-dimensionalarraysareaccessedwithsyntax[i1,:::,in] notc ControlStructures Denition2.4(statementandblock)AprogramstatementisanyCexpression syntax forbetterunderstanding. endedwith\;"or\}".aprogramblockisaspecialkindofstatementthatstarts
64with\{",afunctiondeclaration,alooporaconditionalexpression,andsurrounding CHAPTER2.FRAMEWORK handsideofanassignment,inafunctioncallorinaloopdeclarationareconditional oneormoresub-statements. aresupposedtofollowsomeminimal\codeofethics":eachloopvariableisaectedby statements.moreover,multipleexpressionsseparatedby,arenotallowed,andloops Tosimplifytheexposition,theonlycontrolstructuresthatmayappearintheright- variablemustbeinitialized. asingleloopanditsvalueisnotusedoutsideofthisloop;asaconsequence,eachloop higher-orderstructurescanbehandledconservatively,inapproximatingthepossiblefunctioncallsusingexternalanalysistechniques[cou81,deu90,har89,afl95].callsto callshouldbefullyspeciedatcompile-time,and\computed"gotosareforbidden.but Thisframeworkisprimarilydesignedforrst-ordercontrolstructures:anyfunction formationtechniques,possiblyyieldingincorrectparallelizations. input/outputfunctionsareallowedaswell,butcompletelyignoredbyanalysisandtransrithmsforstructuringprograms[bak77,amm92],atthecostofsomecodeduplicationistrictedpredicatesareallowed.classicalexceptionmechanisms,breaks,andcontinues aresupportedaswell.however,wesupposethatgotosareremovedbywellknownalgo- Recursivecalls,loopswithunrestrictedbounds,andconditionalstatementswithunre- therarecaseswherethecontrolowgraphisnotreducible[asu86]. Weonlyconsider 2.2.2 scalars(boolean,integer,oating-point,pointer...); DataStructures records(non-recursiveandnon-arraystructurewithscalarandrecordelds); treesofscalarsorrecords; arraysofscalarsorrecords; arraysoftrees; strictedarrayvaluesintreesandtreeelementsinarraysareallowed,includingrecursive andtreesofarrays. nestingsofarraysandtrees. Recordsareseenascompoundscalarswithunaliasednamedelds.Moreover,unre- throughtheuseofexplicitpointers.however,tosimplifytheexposition,wesupposethat novariableissimultaneouslyusedasapointer(throughoperators*and->)andasan array(throughoperator[]):inparticular,explicitarraysubscriptsmustbepreferredto Arraysareaccessedthroughtheclassicalsyntax,andotherdatastructuresareaccessed pointerarithmetic. treedeclaration. Byconvention,edgenamesintreesareidenticaltothelabelofpointereldsinthe
2.3.ABSTRACTMODEL Inpracticalimplementations,recursivedatastructuresarenotmadeexplicit.More 65 precisely,twomainproblemsarisewhentryingtobuildanabstractviewofdatastructure denitionandusageincprograms. 1.Multiplestructuredeclarationsmayberelativetothesamedatastructure,with- compile-timeabstractionofdatastructuresusedinaprogramisthusadicult aslists,doubly-linkedlists,trees,acyclicgraphs,generalgraphs,etc.buildinga glerecursivestructdeclarationcandescribeseveralverydierentobjects,such outexplicitdeclarationoftheshapeofthewholeobject.moreover,evenasin- problem,butitisessentialtoouranalysisandtransformationframework.itcanbe basedstructures[gh96,srw96]. achievedintwooppositeways:either\decorating"theccodewithshapedescriptionswhichguidethecompilerwhenbuildingitsabstractviewofdatastructures [KS93,FM97,Mic95,HHN92]orrunningacompile-timeshapeanalysisofpointer- 2.Twopointervariablesmaybealiased,i.e.theymaybetwodierentnamesforthe andpoints-toanalysis[lrz93,egh94,ste96](store-based)techniquesisprecisely todisambiguatepointeraccesses,whenpointerupdatesarenottoocomplextobe samememorylocation.thegoalofaliasanalysis[deu94,cbc93,gh95](store-less) verycostlyandstillalargelyopenproblem:indeed,noinstancewisealiasanalysisfor analyzed.inpractice,onemayexpectgoodresultsforstronglytypedprograms pointershasbeenproposedsofar,anditcouldbeaninterestingfuturedevelopment whethertwopointersreferthesamestructureornot.element-wisealiasanalysisis withoutpointerarithmetics,especiallyifthegoalofthealiasanalysisistocheck associatedthedatastructureinstanceitrefersto. identiedasoneofthesupporteddatatypes,andthateachpointerreferencehasbeen Inthefollowing,wethussupposethattheshapeofeachdatastructurehasbeen ofourframework. dynamicallyeverytimeanout-of-boundaccessisdetected:thisisthecaseofsomeexpandedprogramsstudiedinchapter5.theproblemismorecriticalwithpointer-based availableinmostcases;butsomeprogramsrequiredynamicarrayswhosesizeisupdated modiedanddestroyed?whendealingwitharrays,acompile-timeshapedeclarationis Now,thereisonelastquestionaboutdatastructures:howaretheyconstructed, weconsiderthesameabstraction:alldatastructuresaresupposedtobybuilttotheir datastructures:theyaremostofthetimeallocatedatrun-timewithexplicitmallocor newoperations.thisproblemhasalreadybeenaddressedbyfeautrierin[fea98]and strictiontotheprogrammodel:anyrun-timeinsertionanddeletionisforbidden.infact therearetwoexceptionstothisverystrongrule,buttheywillbedescribedinthenext maximalextent possiblyinnite inapreliminarypartofthecode.toguaranteethat sectionafterpresentingthemathematicalabstractionfordatastructures.nevertheless,a thisabstractioniscorrectregardingdata-owinformation,wemustaddanadditionalre- lotofinterestingprogramswithrecursivepointer-basedstructuresperformrandominsertionsanddeletions,andtheseprogramscannotbehandledatpresentinourframework. Thisissueisleftforfuturework. 2.3 Westartwithapresentationofanamingschemeforstatementinstances,andshowthat executiontracesarenotsuitabletoourpurpose.then,weproposeapowerfulabstraction AbstractModel
formemorylocations. 66 CHAPTER2.FRAMEWORK Inthefollowing,everyprogramstatementissupposedtobelabeled.Thealphabetof 2.3.1 statementlabelsisdenotedbyctrl.now,loopsandconditionalsrequiresspecialattention.becausealoopinvolvesaninitializationstep,aboundcheckstep,andaniteration step,loopsaregiventhreelabels:therstonerepresentstheloopentry,thesecond NamingStatementInstances conditionalstatement,andthetwootherarenon-blocklabels. immediatelyaftereachincrement.theloopcheckisconsideredasablockanda berthat,inc,aboundcheckisperformedimmediatelyaftertheloopentryand oneisthecheckfortermination,andthethirdoneistheloopiteration.remem- Anifthenelsestatementisgiventwolabels:oneforthecondition andthethenbranch,andonefortheelsebranch.bothlabelsareconsideredas putesallpossiblesolutionstothen-queensproblem,usinganarraya(detailsofthecode ConsidertheprogramexampleinFigure2.2.a.Thissimplerecursiveprocedurecom- blocklabels. areomittedhere);itisourrunningexampleinthissection. accessina.statementiandjareconditionals,andstatementqisarecursivecallto procedurequeens.loopstatementsaredividedintothreesub-statementswhicharegiven distinctlabels:therstonedenotestheloopentry e.g.aorb thesecondonedenotes Therearetwoassignmentstatements:swritesintoarrayAandrperformssomeread Finally,PisthelabeloftheprocedureandFdenotestheinitialcallinmain. theboundcheck e.g.aorb andthethirdonedenotestheloopiteration e.g.aorb. traces.theirinterpretationforprogramanalysisisgenerallydenedasapathfromthe entryofthecontrolowgraphtoagivenstatement.1theyrecordeveryexecutionofa instance.toachievethis,manyworksintheprogramanalysiseldrelyonexecution Aprimarygoalforinstancewiseanalysisandtransformationistonameeachstatement statement,includingreturnfromfunctions. 1.becauseofreturnlabels,tracesbelongtoanon-rationallanguageinctrl,assoon Forourpurpose,theseexecutiontraceshavethreemaindrawbacks: 2.full-lengthtracesarehugeandextremelyredundant:ifaninstanceexecutesbefore anotherinthesameprogramexecution,itstraceprexestheother; astherearerecursivefunctioncalls; 3.asinglestatementinstancemayhaveseveralexecutiontracesbecausestatement ctrl[har89]:intuitivelythisfunctioncollapsesallcall-returnpairsinagivenexecution Toovercometherstproblem,aclassicaltechniquereliesonafunctioncalledNeton executionisunknownatcompiletime. trace,yieldingcompactrationalsetsofexecutiontraces.thethirdpointismuchmore unpleasantbecauseitforbidstogiveauniquenametoeachstatementinstance.notice howeverthatdierentexecutiontracesforthesameinstancemustbeassociatedwith distinctexecutionsoftheprogram. 1Withoutnoticeofconditionalexpressionsandloopbounds.
2.3.ABSTRACTMODEL... 67 PvoidQueens(intn,intk){ IintA[n]; A=A=afor(inti=0;i<n;i++){ rb=b=bif(k<n){ Js if(){ for(intj=0;j<k;j++) Q =A[j]; }} A[k]=; } Queens(n,k+1); sj F}intmain(){ FPIAAaAaAJs sj sj P Q Figure2.2.a.ProcedureQueens } Queens(n,0); IAA FPIAAaAaAJQPIAABBr JrFB...Figure2.2.ProcedureQueensandcontroltree... Figure2.2.b.Controltree vocabularyforthisrepresentationhasbeendenedinpartsandwithseveralvariations behindournamingschemeforinstancesistoconsidersomekindof\extendedstack states"whereloopsareseenasspecialcasesofrecursiveprocedures.thededicated Oursolutionstartsfromanotherrepresentationoftheprogramow:theintuition in[cc98,coh99a,coh97,fea98]. Dependingonthenumberofiterationsoftheinnermostloop boundedbyk anexecutiontraceforthisrstinstancecanbeoneoffpiaabbjs,fpiaabbbbjs,fpiaabbbbbbjs, :::,FPIAABB(bB)kJs.Sincewewouldliketogiveauniquenametotherstinstanceof Letusstartwithanexample:therstinstanceofstatementsinprocedureQueens. s,allb,bandblabelsshouldintuitivelybeleftout.now,foragivenprogramexecution, innermostloopinexecutiontracesoftherstinstanceofstatements:thesingleword iterationsandprocedurecallsleadingtoit.toeachlistcorrespondsaword:theconcatenationofstatementlabels.thisispreciselywhatwegetwhenforgettingaboutthe anystatementinstanceisassociatedwithaunique(ordered)listofblockenterings,loop FPIAAJs.TheseconceptsareillustratedbythetreeinFigure2.2.b,tobedenedlater. Wenowformallydescribethesewordsandtheirrelationwithstatementinstances. Denition2.5(controlautomatonandcontrolwords)Thecontrolautomatonof theprogramisanite-stateautomatonwhosestatesarestatementsintheprogram andwhereatransitionfromastateqtoastateq0expressthatstatementq0occursin
68blockq.Suchatransitionislabeledbyq0.Theinitialstateisthestatementexecuted CHAPTER2.FRAMEWORK atthebeginningofprogramexecution,andallstatesarenal. Lemma2.1Iebeingthesetofstatementinstancesforagivenexecutioneofaprogram, theybuildarationallanguagelctrlincludedinctrl. Wordsacceptedbythecontrolautomatonarecalledcontrolwords.Byconstruction, thereisanaturalinjectionfromietothelanguagelctrlofcontrolwords. deneafunctionffromietonctrl listsofstatementslabels mappingstatement instancestotheirrespectivelistofblockenterings,loopiterationsandprocedurecalls. listofblockenterings,loopiterationsandprocedurecallsleadingtoit.wecanthus Proof:Anystatementinstanceinaprogramexecutionisassociatedwithaunique Consideraninstances{1ofastatements1andaninstance{2ofastatements2,and statements0maybesuchthat(f({);s)=(f({0);s0). pairofastatementsandaninstance{ofs,thisprovesthatnootherinstance{0ofa supposef({1)=f({2)=l.bydenitionoff,bothstatementss1ands2mustbepart ofthesameprogramblockb,andprecisely,thelastelementoflisb.consideringa astatementstotheconcatenationofalllabelsinf({)andsitself.thankstothe precedingpropertyonpairs(f({);s),function Considerafunction fromietolctrl controlwords whichmapsaninstance{of Theorem2.1LetIbetheunionofallsetsofstatementinstancesIeforeverypossible executioneofaprogram.thereisanaturalinjectionfromitothelanguagelctrlof isinjective. proofoflemma2.1isdenotedby controlwords. Proof:Considertwoexecutionse1ande2ofaprogram.Thefunctiondenedinthe instance{ispartofbothie1andie2ofaprogram,controlwords thesame,becausethelistofblockenterings,loopiterationsandfunctioncallsleading to{areunchanged.lemma2.1terminatestheproof. 1forexecutione1and 2forexecutione2.Ifan 1({)and 2({)are general,theseteofpossibleprogramexecutionsandthesetiefore2eareunknown Wearethusallowedtotalkabout\thecontrolwordofastatementinstance".In atcompile-time,andwemayconsiderallinstancesthatmayexecuteduringanyprogramexecution.eventually,thenaturalinjectionbecomesaone-to-onemappingwhen aconsequence,ifwisacontrolword,wewillsay\instancew"insteadof\theinstance whosecontrolwordisw". extendingthesetiewithallpossibleinstancesassociatedto\legal"controlwords.as access\inside"thecontrolword:wethusextendthealphabetofstatementlabelsctrl solutionconsistsinconsideringpairs(w;ref),wherewisacontrolwordforsomeinstance ofastatementsandrefisareferenceinstatements.butweprefertoencodethefull Wearealsointerestedinencodingaccessesthemselveswithcontrolwords.Asimple lastletterinacontrolwordwisoftheformsref,itmeansthatwrepresentsanaccess insteadofaninstance.however,whenclearfromthecontext,i.e.whenthereisonlyone course,extendedlabelsmayonlytakeplaceasthelastletterinacontrolword:whenthe withlettersoftheformsref,forallstatements2ctrlandreferencerefins.of \interesting"referenceinagivenstatementorallreferencesareidentical,thereference willbetakenoutofthecontrolwordofaccesses.thiswillbethecaseinmostpractical examples.
2.3.ABSTRACTMODEL Eventually,noticethatsomestatesinthecontrolautomatonhaveexactlyoneincomingtransitionandoneoutgoingtransition(loopingtransitionscountasbothincoming 69 trolautomatonwhereallstateswithexactlyoneincomingtransitionandoneoutgoing canbereachedfromorleadto:ineverycontrolword,thelabeloftheoutgoingtransition andoutgoing).now,thesestatesdonotcarryanyinformationaboutwhereastatement transitionareremoved.thistransformationhasnoimpactoncontrolwords. followsthelabeloftheincomingone.inpractice,weoftenconsideracompressedcon- withrecursivecalls. pressedcontrolautomaton,andthatcyclesinvolvingmorethanonestateareassociated... Observethatloopsintheprogramarerepresentedbyloopingtransitionsinthecom- FP FP IAA IAA PFP BB B r JJ s Q a P r sqaa A IAA B r rbb J QP bb aa J bb Figure2.3.b.Compressedcontrolautomaton Figure2.3.a.Controlautomaton...Figure2.3.ControlautomataforprogramQueens... ThecompressedautomatonisdescribedinFigure2.3.b. F,I,A,B,Q,aandbareuseless,theyareremovedalongwiththeiroutgoingedges. Figure2.3.adescribestheplaincontrolautomatonforprocedureQueens.2Sincestates thestateassociatedtothisstatementastheonlynalone. controlwordstoinstancesofaparticularstatement.thisiseasilyachievedinchoosing Asapracticalremark,noticethatitisoftendesirabletorestrictthelanguageof tocomparetheexecutiontracesofaninstance{andthecontrolwordof{.indeed,the 2Everystateisnal,butthisisnotmadeexplicitonthegure. Toconcludethispresentationofanamingschemeforstatementinstances,itispossible
followingpropertyisquitenatural:itresultsfromtheobservationthattracesofan 70 CHAPTER2.FRAMEWORK Proposition2.1Thecontrolwordofastatementinstanceisasub-wordofeveryexecutiontraceofthisinstance. instancemayonlydierinlabelsofstatementsthatarenotpartofthelistofblock enterings,loopiterationsandfunctioncallsleadingtothisinstance. 2.3.2 Thesequentialexecutionorderoftheprogramdenesatotalorderoverinstances,callit <seq.inenglish,wordsareorderedbythelexicographicordergeneratedbythealphabet SequentialExecutionOrder ordera<b<c<.similarly,inanyprogramonecandeneapartialtextualorder <txtoverstatements:statementsinthesameblockaresortedinapparitionorder,and statementsappearingindierentblocsaremutuallyincomparable. mentsinsidetheloopbody,butentryandchecklabelsarenotcomparablewiththese statements.forprocedurequeensinfigure2.2.a,wehaveb<txtj<txta,r<txtb ands<txtq. Rememberthespecialcaseofloops:theiterationlabelexecutesafterallthestate- Thistextualordergeneratesalexicographiconeoncontrolwords,denotedby<lex: Thisorderisonlypartialonctrl.However,byconstructionofthetextualorder: w0<lexw()9x;x02ctrl;u;v;v02ctrl:w=uxv;w0=ux0v0;x0<txtx _ 9v02ctrl:w=w0v (a.k.a.prexorder): Proposition2.2Aninstance{0executesbeforeaninstance{itheirrespectivecontrol wordsw0andwsatisfyw0<lexw. elsecasesofthesameifinstanceareneversimultaneouslyexecutedinasingleexecution. conditionalarenotcomparable!thisdoesnotyieldacontradiction,becausethethenand Ingeneral,thelexicographicorderistotalonthesubsetofcontrolwordscorresponding Noticethatthelexicographicorder<lexisnottotalonLctrlbecausebothcasesona toinstancesthatdoexecute inone-to-onemappingwithieforsomeexecutione2e. Consideracontrolwordux,u2ctrlandx2ctrl;everydownwardedgefromanode rootisnamed"andeveryedgeislabeledbyastatement.eachnodethencorresponds tothecontrolwordequaltotheconcatenationofedgelabelsstartingfromtheroot. Eventually,thelanguageofcontrolwordsisbestunderstoodasaninnitetree,whose whosecontrolwordisuxcorrespondstoanoutgoingtransitionfromstatexinthecontrol functionallanguagescommunity,butcontroltreeismoreadequateinthepresenceofloops torightaccordingtothetextualorder.suchatreeisusuallycalledacalltreeinthe automaton.torepresentthelexicographicorder,downwardedgesareorderedfromleft controltrees,dependendingonthecontrolautomatonwhichdenesthem. andothernon-functionalcontrolstructures.onemaytalkaboutplainandcompressed isapossiblerun-timeinstanceofstatementr depictedbyastarinfigure2.2.b,and controlwordfpiaaaaaajs depictedbyablacksquare isapossiblerun-timeinstance onewillbestudiedlaterinfigure4.1page124).controlwordfpiaaaaaajqpiaabbr ApartialcontroltreeforprocedureQueensisshowninFigure2.2.b(acompressed ofstatements.
2.3.ABSTRACTMODEL 2.3.3 AdressingMemoryLocations 71 Griebl[CCG96],butitisalsohighlyrelevanttopreviousworkbyAlabauandVauquelin wealreadyproposed[cc98,coh99a,coh97,fea98]someofwhichincollaborationwith programanalysis.thispresentationcanbeseenasanextensionofseveralframeworks Alargenumberofdatastructureabstractionshavebeendesignedforthepurposeof multi-dimensionalones.treeadressesareconcatenationofedgenames(seesection2.2.2) andhilnger[lh88]. [Ala94],byGiavitto,MichelandSansonnet[Mic95],byDeutsch[Deu92]andbyLarus startingfromtheroot.theaddressoftherootissimply",thezero-lengthword.for example,thenameofnoderoot->l->rinabinarytreeislr.thesetofedgenamesis Withnosurprise,arrayelementsareaddressedbyintegers,orvectorsofintegersfor denotedbydata.thelayoutoftreesinmemoryisthusdescribedbyarationallanguage Ldatadataoveredgenames. whichcapturesrelationsbetweenintegervectors,betweenwords,andbetweenthetwo. Dealingwithtreesonly,Feautrierproposedtouserationaltransductionsbetweenfree monoidsin[fea98].wewillformallydenesuchtransductionsinsection3.3,andthen Forthepurposeofdependenceanalysis,wearelookingforamathematicalabstraction monoids,tohandlearraysandnestedtreesandarraysaswell. showhowthesameideacancanbeextendedtomoregeneralclassesoftransductionsand ExtendingtheDataStructureModel arereferencetotheparentandlinksbetweennodesatthesameheightinatree.such Someinterestingstructuresarebasicallytreestructuresenhancedwithtraversaledges. Inmanycases,thesetraversaledgeshaveaveryregularstructure.Mostusualcases traversaledgesareoftenusedtofacilitatespecial-purposetraversalalgorithms.there issomesupportforsuchstructureswhentraversaledgesareknownfunctionsofthe generatorsofthetreestructure[ks93,fm97,mic95],i.e.the\back-bone"spanningtree afullchapterwouldbenecessaryandoursupportfortraversaledgesdoesnotinclude traversaledgesisnotsupported.wewillnotstudythisextensionanyfurtherbecause ofthegraph.insuchacase,traversaledgesaremerelyan\algorithmicalsugar"forbetter recursionanditeration. performance.buteventhough,oursupportislimitedsincerecursionanditerationover Thekeyideatohandlebotharraysandtreesisthattheyshareacommonmathematical AbstractMemoryModel abstraction:themonoid.foraquickrecallofmonoiddenitionsandproperties,see Section3.2.Indeedrationallanguages(treeaddresses)aresubsetsoffreemonoidswith wordconcatenation,andsetsofintegervectors(arraysubscripts)arefreecommutative monoidswithvectoraddition.themonoidabstractionforadatastructurewillbedenoted bymdata,andthesubsetofthismonoidcorrespondingtovalidelementsofthestructure willbedenotedbyldata. nessofmonoidabstractions.ourrstexampleisthehash-tablestructuredescribedin abstractionmdataforthisstructureisgeneratedbyz[fng,anditsbinaryoperation Figure2.4.Itdenesanarraywhoseelementsarepointerstolistsonintegers.Amonoid Thecaseofnestedarraysandtreesisabitmorecomplexbutrevealstheexpressive-
72... CHAPTER2.FRAMEWORK 1 9 1517 structkey{ 0 11 1619 };key*n; intvalue; //nextkey //valueofkey 2 18 key*hash[7]; isdenedasfollows:...figure2.4.hash-tabledeclaration... 8i2Z:in=in 8i2Z:ni=ni nn=nn (neverusedforthehash-table) (2.1) (2.2) ThesetLdataMdataofvalidmemorylocationsinthisstructureisthus 8i;j2Z:ij=i+j: (2.4) (2.3) CheckthatthethirdcaseinthedenitionofoperationisneverusedinLdata. OursecondexampleisthestructuredescribedinFigure2.5.Itdenesanarraywhose Ldata=Zn: elementsarereferencestootherarraysorintegers.eacharrayiseitherterminalwith integerelementsorintermediatewitharrayreferenceelements.thisdenitionisvery Mdataforthisstructureisthesameasthehash-tableone.However,thesetLdataMdata similartole-systemstoragestructures,suchasunix'sinodes.themonoidabstraction ofvalidmemorylocationsinthisstructureisnow Nowthedenitionofoperationisthesameasforthehash-tablestructure,see(2.1). Inthegeneralcaseofnestedarraysandtrees,themonoidabstractionisgeneratedby Ldata=(Zn)Z: wordconcatenationwithadditionalcommutationsbetweenvectorsofthesamedimension. Theresultiscalledafreepartiallycommutativemonoid[RS97b]: theunionofnodenamesintreesandintegervectors.itsbinaryoperationisdenedas Denition2.6(freepartiallycommutativemonoid)Afreepartiallycommutative generatorsofmarelettersinanalphabetaandallvectorsfromaniteunionof monoidmwithbinaryoperationisdenedasfollows: freecommutativemonoidsoftheformzn;
2.3.ABSTRACTMODEL... 73 false 4 true false true 123 56 45 78 true 2 2 2 30 3 22 66 true true 18 17 2 19 2 29 structinode{ //falsemeansintermediatearrayofpointers booleanterminal //arraysize //truemeansterminalarrayofintegers intlength union{ inta[]; //arrayofinodepointers //arrayofblocknumbers }quad; } inode*n[];...figure2.5.aninodedeclaration... foragivenintegern,operationcoincideswithvectoradditiononzn,8x;y2zn: operationcoincideswithwordconcatenationona,8x;y2a:xy=xy; Thisframeworkclearlysupportsrecursivelynestedtreesandarrays. Inthefollowing,weabstractanydatastructureasasubsetLdataofthemonoidMdata xy=x+y. arrays.) withbinaryoperation.(denoteswordconcatenationfortreesandusualsumfor canbehandledbyourframework. deletionappearedintheprogram.thisruleisindeedtooconservative,andtwoexceptions Eventually,wehaverequiredintheprevioussectionthatnorun-timeinsertionor 1.Becauseitmakesnodierencefortheowofdatawhethertheinsertionisdonebeforetheprogramorduringexecution onlyassignmentofthevaluedoesmatters insertionsatalist'stailortree'sleafaresupported. 2.Theabstractionisstillcorrectwhendeletionsatalist'stailortree'sleafaresupported,butmayleadtooverlyconservativeresults.Indeed,supposeaninsertion
74 followsadeletionatthetailofalist.consideringwordsinthefreemonoidabstractionofthelist,thememorylocationofthetailnodebeforedeletionwillbealiased CHAPTER2.FRAMEWORK withthenewlocationoftheinsertedone. Thecaseofnestedloopswithscalarandarrayoperationsisveryimportant.Itappliesto awiderangeofnumerical,signal-processing,scientic,andmulti-mediacodes.alarge 2.3.4 LoopNestsandArrays amountofworkhasbeendevotedtosuchprograms(orprogramfragments),andvery addressingschemeinarrays,usingintegersandintegervectors,becausez-moduleshave frameworkformemoryaddressingandinstancenaming.indeed,wepreferthenatural aboveeasilycapturessuchprograms,itseemsbotheasierandmorenaturaltouseanother powerfulanalysisandtransformationtechniqueshavebeencrafted.whiletheframework amuchricherstructurethanplaincommutativemonoids. commutativemonoids: denition,introducedbyparikh[par66]tostudypropertiesofalgebraicsubsetsoffree controlwordscanbeembeddedintovectors.thisembeddingisbasedonthefollowing Toensureconsistencyofthecontrolwordandintegervectorframeworks,weshowhow Denition2.7AParikhmappingoveralphabetctrlisafunctionfromwordsover Thereisnospecicorderinwhichlabelsaremappedtodimensions,butweareinterested ctrltointegervectorsinncard(ctrl),suchthateachwordwismappedtothevector inaparticularmappingwheredimensionsareorderedfromthelabeloftheouterloopto ofoccurrencecountofeverylabelinw. thelabeloftheinnerone. aretransitionsloopingonthesamestate.asaresult,thelanguageofcontrolwordsisin one-to-onemappingwithitssetofparikhvectors.thefollowingmappingiscomputed fortheloopnestinfigure2.6: Theloopneststructureisnon-recursive,hencetheonlycyclesinthecontrolautomaton AA(aA) BB(bB)s+CC(cC)r!N11 RespectiveParikhvectorsofinstancesAAaAaAaAaABBbBbBsandAAaAaACCcCcCcCrare w7! jwja;jwja;jwja;jwjb;jwjb;jwjb; (1;5;4;1;2;2;0;0;0;1;0)and(1;4;3;0;0;0;1;4;3;0;1). jwjc;jwjc;jwjc;jwjs;jwjr: B=B=b A=A=afor(i=0;i<100;i++){... sc=c=c r for(j=0;j<100;j++) } for(k=0;k<100;k++) A[i,j]= =A[i,k] statementsandcollapsingallloopsatthesamenestinglevelinthesamedimension.doing...figure2.6.computationofparikhvectors... FromParikhvectors,webuilditerationvectorsbyremovingalllabelsofnon-iteration
2.4.INSTANCEWISEANALYSIS this,thereisaone-to-onemappingbetweenparikhvectorsandpairsbuiltofiteration 75 vectorsandstatementlabels.indeed,thestatementlabelcapturesboththelastnon-zero componentoftheparikhvector i.e.theidentityofthestatement andtheidentityof thesurroundingloops i.e.whichdimensioncorrespondstowhichloop. dimension. labelsofiterationstatements andlabelsbandcarecollapsedtogetherintothesecond ContinuingtheexampleinFigure2.6,theonlyremaininglabelsarea,bandc i.e. IterationvectorofinstanceAAaAaACCcCcCcCrofstatementris(2;3). IterationvectorofinstanceAAaAaAaAaABBbBbBsofstatementsis(4;2). icographicorderoniterationvectors(therstdimensionshavingahigherprioritythan thelast). Inthisprocess,thelexicographicorder<lexoncontrolwordsisreplacedbythelex- framework. workfornaminginstancesinloopnests areaspecialcaseofourgeneralcontrolword Becauseastatementinstancecannotbereducedtoaniterationvector,weintroduce Asaconclusion,Parikhmappingsshowthatiterationvectors theclassicalframe- thefollowingnotations(thesenotationsgeneralizetheintuitiveonesattheendofsection2.1): hs;xistandsfortheinstanceofstatementswhoseiterationvectorisx; hs;x;refistandsfortheaccessbuiltfrominstancehs;xiandreferenceref. deeplyinthelinearalgebraicmodeltoberewrittenintermsofcontrolwords.further Inparticular,theymaystillbeusefulwhengotosandnon-recursivefunctioncallsare considered.however,mostinterestingloopnesttransformationtechniquesarerootedtoo Thisdoesnotimplythatcontrolwordsareacaseofoverkillwhenstudyingloopnests. comparisonislargelyopen,butsomeideasandresultsarepointedoutinsection4.7. previousdenition2.2ofaprogramexecutionisnotverypractical.forourpurpose, Becauseourexecutionmodelisbasedoncontrolwordsinsteadofexecutiontraces,the 2.4 InstancewiseAnalysis asequentialexecutione2eofaprogramisseenasapair(<seq;fe),where<seqis thesequentialorderoverallpossiblestatementinstances(associatedtothelanguageof isdeterministic.order<seqisthuspartial,butitsrestrictiontoasetofinstancesiefor controlwords)andfemapseveryaccesstothememorylocationiteitherreadsorwrites. agivenexecutione2eisatotalorder.however,feclearlydependsontheexecutione, possiblestatementinstancesforallexecutions,whichislegalbecausesequentialexecution Noticethat<seqisnotdependentontheexecution:itisdenedastheorderbetweenall ofeverystatementinstance,foragivenexecutionoftheprogram.itisafunctionfromthe CL99] itisalsocalledaccessfunction[cc98,fea98].storagemappinggatherstheeect anditsdomainisexactlythesetaeofaccesses. exactsetaeofaccesses(seedenition2.3)thatactuallyexecuteintothesetofmemory Functionfeisthestoragemappingforexecutioneoftheprogram[CFH95,Coh99b, locations.
76Inpractice,thesequentialexecutionorderisexplicitlydenedbytheprogramsyntax, CHAPTER2.FRAMEWORK tocomputefe(a)forallexecutionseandaccessesa,ortocomputeapproximationsoffe. butitisnotthecaseofthestoragemapping.someanalysishastobeperformed,either referredas\program(<seq;fe)"inthefollowing. butitcanalsobeseenasafunctionmappinge2etopairs(<seq;fe).forthesakeof simplicity,suchafunction whichdenesallpossibleexecutionsofaprogram willbe Eventually,(<seq;fe)hasbeendenedasaviewofaspecicprogramexecutione, 2.4.1 Manyanalysisandtransformationtechniquesrequiresomeinformationon\conicts" betweenmemoryaccesses. ConictingAccessesandDependences Denition2.8(conict)Twoaccessesaanda0areinconictiftheyaccess either [TD95].Analysisofconictingaccessesisalsoverysimilartoaliasanalysis[Deu94, readorwrite thesamememorylocation:fe(a)=fe(a0). CBC93].Theconictrelationistherelationbetweenconictingaccesses,andisdenoted byeforagivenexecutione2e.anexactknowledgeoffeandeisimpossiblein Thisvocabularyisinheritedfromthecacheanalysisframeworkanditsconictmisses thereisanexecutionesuchthatv;w2aeandfe(v)=fe(w),i.e. conictrelation,compatiblewithanyexecutionoftheprogram:vwmustholdwhen analysisofconictingaccessesconsistsinbuildingaconservativeapproximationofthe general,sincefemaydependontheinitialstateofmemoryand/orinputdata.thus, Thisconditionistheonlyrequirementonrelation,butapreciseapproximationis generallyhopedfor.formostprogramanalysispurposes,thisrelationonlyneedsto 8e2E;8v;w2Ae: fe(v)=fe(w)=)vw: (2.5) becomputedonwrites,orbetweenreadsandwrites,butotherproblemssuchascache compile-time,thesetofstatementinstancesiecanbeeitherstatementsorstatement functionsonwhichnoinformationisavailable.becausethesignofvisunknownat analysis[td95]requireafullcomputation. T(statementscoincideswithstatementinstancessincetheyarenotsurroundedbyany ConsidertheexampleinFigure2.7whereFirstIndexandSecondIndexareexternal andthentheymayalsoyieldconictingaccesses,i.e. compile-time.theonlyavailablecompile-timeinformationisthatsandtmayexecute, looporprocedurecall),dependingontheexecution.sincetheresultsoffirstindex andsecondindexareunpredictabletoo,noexactstoragemappingcanbecomputedat theifthenelseconstructsyntax),andthensandtcannotbeconicting However,anotherinformationisthatexecutionsofSandTaremutuallyexclusive(dueto hs;a[firstindex()]iht;a[secondindex()]i: accesses: ertiessuchasconictingaccesses,anditalsoshowshowcomplexitistoachieveprecise Thisexampleshowstheneedforcomputingapproximativeresultsaboutdata-owprop- @e2e: S2Ae^T2Ae: results.
2.4.INSTANCEWISEANALYSIS... 77 scanf("%d",&v); intv,a[10]; TS else if(v>0) A[SecondIndex()]= A[FirstIndex()]=...Figure2.7.Execution-dependentstoragemappings... Denition2.9(dependence)Anaccessadependsonanotheraccessa0ifatleastone toexecuteinanyorder.suchconditionscanbeexpressedintermsofdependences: Forthepurposeofparallelization,weneedsucientconditionstoallowtwoaccesses isawrite(i.e.a2weora02we),iftheyareinconict i.e.fe(a)=fe(a0) andif a0ea: a0executesbeforea i.e.a0<seqa. 8e2E;8a;a02Ae: Thedependencerelationforanexecutioneisdenotedbye:adependsona0iswritten Onceagain,anexactknowledgeofeisimpossibleingeneral.Thus,dependenceanalysis a0eadef ()(a2we_a02we)^a0<seqa^fe(a)=fe(a0): consistsinbuildingaconservativeapproximation,i.e. (2.6) 8e2E;8a;a02Ae: a0ea=)a0a: e.g.inparallel iftheyarenotdependent. Eventually,Bernstein'sconditionstellthattwoaccessescanbeexecutedinanyorder (2.7) givenareadaccessinmemory,theyneedtoidentifythestatementinstancethatproduced Sometechniquesrequiremoreprecisionthanisavailablethroughdependenceanalysis: 2.4.2 ReachingDenitionAnalysis thevalue.thenthereadaccessiscalledtheuseandtheinstancethatproducedthevalue denitionisindeedthelastinstance accordingtotheexecutionorder onwhichtheuse depends. iscalledthe\denition"that\reaches"theuse,orreachingdenition.thereaching Wethusdenefunctione,mappingeveryreadaccesstoitsreachingdenition: or,replacingmaxwithitsdenition: 8e2E;8u2Re:e(u)=max <seqv2we:veu ; (2.8) 8e2E;8u2Re;v2We: v=e(u)def veu^ 8w2We:u<seqw_w<seqv_:(wu): ()
or,replacingewithitsdenition(2.6): 78 CHAPTER2.FRAMEWORK Sodenitionvreachesuseuifitexecutesbeforetheuse,ifbothrefertothesamememory 8e2E;8u2Re;v2We: v<sequ^ 8w2We:u<seqw_w<seqv_fe(v)6=fe(w): v=e(u)def () location,andifnointerveningwritewkillsthedenition. largerprogram.tocopewiththisproblem,weaddavirtualstatementinstance?which executesbeforeallinstancesintheprogramandassignseverymemorylocation.then, value(hintingataprogrammingerror)ortheanalyzedprogramisonlyapartofa Whenareadinstanceuhasnoreachingdenition,eitherureadsanuninitialized eachreadinstanceuhasauniquereachingdenition,whichmaybe?. analysiscomputesaconservativeapproximation.itispreferablyseenasarelation,i.e. Becausenoexactknowledgeofecanbehopedforingeneral,reachingdenition ofpossiblereachingdenitions.onemustbeverycarefulinthedistinctionbetweena Onemayalsouseasafunctionfromreadstosetsofwrites,andwetalkaboutsets 8e2E;8u2Re;v2We: v=e(u)=)vu: (2.9) reachingdenitionsisthekeytoprogramcheckingtechniques,sinceitmaycorrespond producedbeforeexecutingtheprogram.thefactthat?appearsinasetofpossible setofeectiveinstancessieandthesets[f?g:if?62(u)thenitsaysthatu touninitializedvalues. readsavalueproducedbysomeinstanceins,butif?2(u)thenumayreadavalue presentedin[cbf95].theprogrammodelisrestrictedtoloopnestswithunrestricted Thissectionisanoverviewoffuzzyarraydataowanalysis(FADA);whichwasrst 2.4.3 AnExampleofInstancewiseReachingDenitionAnalysis conditionals,loopboundsandarraysubscripts.theaimofthisshortpresentationis toallowcomparisonwithourownanalysisforrecursiveprograms,andbecausetheresultsofaninstancewisereachingdenitionanalysisforloopnestsareextensivelyusedin IntuitiveFlavor Chapter5. Accordingto(2.8),theexactreachingdenitionofsomereadaccessu e(u) isdened non-linearbounds,wehavetocopewithaconservativeapproximationofthedependence asthemaximumofthesetofwritesine(u)(foragivenprogramexecutione2e). Assoonastheprogrammodelincludesconditionals,whileloops,anddoloopswith arraysubscripts. aneconstraintsin(2.6)areapproximatedusingadditionalanalysesonvariablesand relation.inthecaseofnestedloops,oneusuallylookforananerelation,andnon- approximatesetofdependenceshasnomeaning:theveryexecutionofinstancesin(u) isnotguaranteed.onesolutionistotaketheentireset(u)asanapproximationofthe reachingdenition.canwedobetterthanthat?letusconsideranexample.noticerst Butthen,andwiththeexceptionofveryspecialcases,computingthemaximumofan that,forexpositoryreasons,onlyscalarsareconsidered.themethod,however,applies toarrayswithanysubscript. for(i=0;i<n;i++){
2.4.INSTANCEWISEANALYSIS if() 79 S1 S2 else R=x; } x=; AssumingthatN1,whatisthereachingdenitionofreferencexinstatementR? SinceallinstancesofS1andS2areindependencewithhRi,itseemslikewecannotdo betterthatapproximating(hri)withfhs1;1i;:::;hs1;ni;hs2;1i;:::;hs1;nig. testatiterationi,foraprogramexecutione2e.thisallowstocomputetheexact dependencerelationeatcompile-time: Letusintroduceanewbooleanfunctionbe(i)whichrepresentstheoutcomeofthe whichcanalsobewritten 8e2E;8v2We: vehri()9i2f1;:::;ng:(v=hs1;ii^be(i))_(v=hs2;ii^:be(i)); themaximumofe(hri). Sincetheaboveresultisnotapproximate,theexactreachingdenitione(hRi)ofhRiis 8e2E:e(hRi)=fhS1;ii:1iN^be(i)g[fhS2;ii:1iN^:be(i)g: hs2;iiwithi<nisoverwritteneitherbyhs1;niorbyhs2;ni.thisprovesthat1emust beequalton.conversely,supposinge(hri)isaninstancehs2;2ei,thesamereasoning :be(i)isequaltotrueforalli2f1;:::;ng,anyvalueproducedbyaninstancehs1;iior Supposee(hRi)isaninstancehS1;1eiforsomeexecutione2E.Becausebe(i)_ provesthat2emustbeequalton.then,wehavethefollowingresultforfunctione: Wemaynowreplacebeand:bebytheirconservativeapproximations: 8e2E: e(hri)=fhs1;ni:be(n)g[fhs2;ni::be(n)g: (2.10) Noticeherethehighprecisionachieved. Tosummarizetheseobservations,ourmethodwillbetogivenewnamestotheresultof (hri)=fhs1;ni;hs2;nig: (2.11) maximacalculationsinthepresenceofnon-linearterms.thesenamesarecalledparametersandarenotarbitrary:asshownintheexample,somepropertiesontheseparameters increasetheaccuracyofthereachingdenition.insomecases,theserelationsmaybeso techniques.theserelationsimplyrelationsontheparameters,whicharethenusedto byasimpleexaminationofthesyntacticstructureoftheprogramorbymoresophisticated canbederived.moregenerally,onecanndrelationsonnon-linearconstraints likebe preciseastoreducethe\fuzzy"reachingdenitiontoasingleton,thusgivinganexact result.see[bcf97,bar98]foraformaldenitionandhandlingoftheseparameters. eithersetsofinstanceswhoseiterationvectorcomponentsareagainquasi-ane,or?. forthepositivenessofquasi-aneforms(whichincludeintegerdivision),andleavesare denitionrelationisaquast,i.e.anestedconditionalinwhichpredicatesaretests ThegeneralresultcomputedbyFADAisthefollowing:theinstancewisereaching SeeSection3.1fordetailsaboutquasts.
ImprovingAccuracy 80 CHAPTER2.FRAMEWORK inthepreviousexample,thesepropertiesimplypropertiesontheparametersintroduced inourcomputation. Toimprovetheaccuracyofouranalysis,propertiesonnon-aneconstraintsinvolvedin thedescriptionofthedependencescanbeintegratedinthedata-owanalysis.asshown tion.however,therelationstheyndcanbewrittenasrstorderformulasofadditive verydierentformalismsandalgorithms,frompattern-matchingtoabstractinterpretagramoronnon-anefunctions(see[ch78,mas93,mp94,tp95]forinstance).theyuse Severaltechniqueshavebeenproposedtondpropertiesonthevariablesofthepro- arithmetic(a.k.a.presburgerarithmetics,seesection3.1)onthevariablesandnon-ane algorithmindependentofthepracticaltechniqueinvolvedtondproperties. functionsoftheprogram.thisgeneraltypeofpropertymakesthedata-owanalysis setofpossiblereachingdenitions[bar98].thisisduetodecidabilityreasons;butfor (fullyorpartially)theseproperties.ingeneral,theanalysiscannotndthesmallest Thequalityoftheapproximationisdenedw.r.t.theabilityoftheanalysistointegrate Howthepropertiesaretakenintoaccountintheanalysisisdetailedin[BCF97,Bar98]. approximationcanbefound. somekindofproperties,suchaspropertiesimpliedbytheprogramstructure,thebest Untilthen,everysetofinstancesoraccessesconsideredwasexactanddependentonthe execution.however,ashintedbefore,wewillmostlyconsiderapproximativesetsand 2.4.4 MoreAboutApproximations relationsinthefollowing.forthisreason,weneedthefollowingconservativeapproximations: I,thesetofallpossiblestatementinstancesforeverypossibleexecutionofagiven A,thesetofallpossibleaccesses, program, 8e2E: {2Ie=){2I; R,thesetofallpossiblereads, 8e2E: a2ae=)a2a; W,thesetofallpossiblewrites, 8e2E: a2re=)a2r; Theycanbeveryconservativeorbetheresultofaverypreciseanalysis.Inpractice,the 8e2E: a2we=)a2w: precisionofthesesetsisnotcriticalbecausetheyarerarelydirectlyusedinalgorithms instancesandaccesses,whichhavetheirowndedicatedanalysisandapproximation. (buttheyarewidelyusedintheoreticalframeworksassociatedwiththesealgorithms). Mostofthetime,theyareimplicitlypresentasdomainsorimagesofeveryrelationover formationtechniques.inourframework,nootherinstancewiseinformationisavailable itmeansoptimalityaccordingtothisinformation:nobodycandoabetterjobifhisonly atcompile-time.inparticular,whenwepresentanoptimalityresultforsomealgorithm SetsI,A,R,Wandrelations,6,,arethekeytoprogramanalysisandtrans- informationsarethesetsandrelationsabove.
2.5.PARALLELIZATION Parallelization 81 preservethesequentialprogramsemantics.e WiththemodeldenedinSection2.4,parallelizationofsomeprogram(<seq;fe)means constructionofaprogram(<par;fexp orderandasub-orderof<seq.buildinganewstoragemappingfexp memoryexpansion.3obviously,<parandfexp e),where<parisaparallelexecutionorder:apartial Someadditionalpropertiesthatarenotmandatoryfortheexpansioncorrectness,are mustsatisfyseveralpropertiesinorderto e fromfeiscalled guaranteedbymostpracticalexpansiontechniques.forexample,thepropertythatthey eectively\expand"datastructures.intuitively,astoragemappingfexp Denition2.10(ner)Foragivenexecutioneofaprogram,astoragemappingfexp whenitusesatleastasmuchmemoryasfe.moreprecisely: e isnerthanfe isnerthanfeif 8v;w2W: e(v)=fexp e(w)=)fe(v)=fe(w): e Somebasicexpansiontechniquestechniquestobuildastoragemappingfexp 2.5.1 listedinsection1.2,theyareusedimplicitlyorexplicitlyinmostmemoryexpansion MemoryExpansionandParallelismExtraction algorithms,suchastheonespresentedinchapter5. Now,thebenetofmemoryexpansionistoremovespuriousdependencesduetomemoryreuse:\themoreexpansion,thelessmemoryreuse".Then,removingdependences withsequentialexecutionorder(<seq;fexp sidertheexactdependencerelationexp extractsmoreparallelism:\thelessmemoryreuse,themoreparallelism".indeed,con- e havebeen 8e2E;8a;a02Ae: eforthesameexecutionoftheexpandedprogram a0exp eadef ()(a2we_a02we)^a0<seqa^fexp e): (overaccesses): Anyparallelorder<par(overinstances)mustbeconsistentwithdependencerelationexp e(a)=fexp e(a0):(2.12) 8e2E;8({1;r1);({2;r2)2Ae: ({1;r1)exp e({2;r2)=){1<par{2 e mationexpofexp ({1,{2areinstancesandr1,r2arereferencesinastatement). itscomputationisinducedbytheexpansionstrategy,seesection5.4.8forexample. Ofcourse,wewantacompile-timedescriptionandconsideraconservativeapproxi- Theorem2.2(correctnesscriterionofparallelexecutionorders)Giventhefollowingcondition,theparallelorderiscorrectfortheexpandedprogram(itpreserves theoriginalprogramsemantics). e.thisapproximationdoesnotrequireanyspecicanalysisingeneral: vertedtosingle-assignmentform(butnotssa):everydependenceduetomemoryreuse Animportantremarkisthatexp 8({1;r1);({2;r2)2A:({1;r1)exp({2;r2)=){1<par{2: e isactuallyequaltoewhentheprogramiscon- (2.13) isremoved.wemaythusconsiderexp=toparallelizesuchcodes. 3Becausemostofthetime,fexp erequiresmorememorythanfe.
2.5.2 82 ComputationofaParallelExecutionOrder CHAPTER2.FRAMEWORK <par,dataparallelism thesecondparadigm willbeassumed. twomainparadigmstogenerateparallelcode.tocomputetheparallelexecutionorder programswillbeaddressedinsection5.5.wehavealreadypresented insection1.2 Inthissection,werecallsomeclassicalresultsaboutloopnestparallelization;recursive Insteadofpresentinganovelalgorithmforparallelization,weshowhowmostofthe byseveralauthors:[col95a,col94b,gc95]tociteonlytheresultsnearesttoourwork. existingonescanbeintegratedinourframework. Extendingparallelizationtechniquestoirregularloopnestshasalreadybeenstudied Scheduling Dependenceorreachingdenitionanalysesderiveagraphwherenodesareoperationsand edgesareconstraintsontheexecutionorder.theproblemisnowtotraversethegraphin growwithproblemsize,i.e.aclosedform.additionalconstraintsontheexpressionof asthelistofrelationpairs:oneneedsanexpressionofthepartialorderthatdoesnot theorder,thehighertheparallelism.ingeneral,thispartialordercannotbeexpressed apartialorder;thisorderistheexecutionorderfortheparallelprogram.themorepartial partialordersare:haveahighexpressivepower;beeasilyfoundandmanipulated;allow optimizedcodegeneration. schedules.thisissueisstudiedbyfeautrierin[fea92].thefollowingdenitionscon- instancestothesetnofpositiveintegers.inamoregeneralpresentationofschedules, vectorsofintegerscanbeused:onemaythentalkaboutmultidimensional\time"and Asuitablesolutionistouseaschedule[Fea92],i.e.afunctionfromthesetIofall resultforaschedulefunction,thecorrectnessbecomes ordersaredenedfromthedependencerelationintheexpandedprogram.rewritingthis siderone-dimensionalschedulesonly,butitmakesnofundamentaldierencewithmulti- dimensionalones.fromtheorem2.2,wealreadyknowhowthecorrectparallelexecution whereexpisthedependencerelationintheexpandedprogram.(formultidimensional schedules,<lexisusedtocomparevectors).ifnoexpansionhasbeenperformedexpis 8({1;r1);({2;r2)2A:({1;r1)exp({2;r2)=)({1)<({2); (2.14) theoriginaldependencerelation.iftheprogramhasbeenconvertedtosingleassignment form,itisthereachingdenitionrelation.ontheotherhand,sinceisintegervalued, theconstraintaboveisequivalentto: unknownfunction.asitisoftentrueforsystemofinequalities,itmayhavemany Thissystemoffunctionalinequalities,calledcausalityconstraints,mustbesolvedforthe 8({1;r1);({2;r2)2A:({1;r1)exp({2;r2)=)({1)+1({2): (2.15) dierentsolutions.onecanminimizevariousobjectivefunctions,ase.g.thenumberof synchronizationpointsorthelatency. Feautrier'sSchedulingAlgorithm concatenatingiter({1),iter({2),andthevectorofsymbolicconstantsintheproblem Inthefollowing,notationIter({)denotestheiterationvectorofinstance{.Considering (2.15),letusintroduce,thevectorofallvariablesintheproblem:isobtainedby
2.5.PARALLELIZATION (recalliter(hs;xi)=x).itsohappensthat,inthecontextofanedependencerelations, 83 (({1;r1)exp({2;r2))isthedisjunctionofconjunctionsofaneinequalities.Inotherwords, relations,isalsotruewhenthedependencerelationisapproximatedinvariouswayssuch thesetf(u;v):uexpvgisaunionofconvexpolyhedra.thisresult,builtforgeneralane Ci()0,1iM.Similarly,let asdependencecones,directionvectorsanddependencelevels,see[pd96,ban92,dv97]. Sincetheconstraintsintheantecedentof(2.15)areane;letusdenotethemby Lemma2.2(AneFormofFarkas'Lemma)Ananefunction (2.15).Then,wecanapplythefollowinglemma: ()0betheconsequent(v) (u) 10in existsnon-negativeintegers0;:::;m(thefarkasmultipliers)suchthat: vectorstointegersisnon-negativeonapolyhedronf:ci()0;1imgifthere ()frominteger Thisrelationisvalidforallvaluesof.Hence,onecanequatetheconstanttermandthe ()=0+MXi=1iCi() (2.16) [Fea88b,Pug92](seealsoSection3.1). coecientofeachvariableineachsideoftheidentity,togetasetoflinearequationswhere latterareconstrainedtobepositive,thesystemmustbesolvedbylinearprogramming theunknownsarethecoecientsoftheschedulesandthefarkasmultipliers,i.sincethe itisclearthatsomeloopnestshavefeworevennoparallelism,hencenoaneschedule. thatwhenaloopnesthasananeschedule,ithasalargedegreeofparallelism.however, Thesolutioninthiscaseistouseamultidimensionalaneschedule,whosedomainisNd, Unfortunately,someloopnestsdonothave\simple"aneschedules.Thereasonis Itcanbeprovedthatanyloopnestinanimperativeprogramhasamultidimensional d>1,orderedaccordingtothelexicographicorder.suchaschedulecanhaveaslowa schedule.noticethatmultidimensionalschedulesareparticularlyusefulinthecaseof tionofamultidimensionalschedulecanbeautomatedbyusingalgorithmsfrom[fea92]. degreeofparallelismasnecessary,andcanevenrepresentsequentialprograms.theselec- andhencetounderestimatethedegreeofparallelism. dynamiccontrolprograms,sincewehaveinthatcasetooverestimatethedependences plexinpractice:issuessuchaspolyhedron-scanning[ai91],communicationhandling, [PD96](pages79{103).Dealingwithcomplexloopboundsandconditionalsraisesnew taskplacement,andlow-leveloptimizationsarecriticalforecientcodegeneration Codegenerationofparallelscheduledprogramsissimpleintheory,butverycom- codegenerationproblems{nottalkingaboutallocationofexpandeddatastructures see OtherSchedulingTechniques [GC95,Col94a,Col95b]. loopfusion,loopinterchange,loopreversal,loopskewing,loopscaling,loopreindexing andstatementreordering.moreover,dependencesabstractionsweremuchlessexpressive rithmswerebasedonclassicallooptransformationtechniquesthatincludeloopssion, BeforethegeneralsolutiontotheschedulingproblemproposedbyFeautrier,mostalgocoveredbyDarteandVivien[DV97].Extendingpreviousresults,theydesignedavery othersolutions[ban92].severalcomplexityandoptimalityresultshavealsobeendis- thananerelations. TherstalgorithmwasdesignedbyAllenandKennedy[AK87],whichinspiredmany
powerfulalgorithm,butitsabstractiondoesnotsupportthefullexpressivepowerofane 84 CHAPTER2.FRAMEWORK relations. schedule,andtheycomparetheirtechniquewithotherrecentimprovements. proposein[ll97]atechniquetoreducethenumberofsynchronizationsinducedbya becauseofthewiderangeofobjectivefunctionstooptimize.forexample,limandlam Moreover,manyoptimizationsofFeautrier'salgorithmhavebeendesigned,mainly dencegraphs,butitisnotforgeneralanerelations.ithasbeenexploredbycollardand Feautrierasawaytoextractmoreparallelismfromprogramswithcomplexloopbounds andconditionals[col95a,col94b]. Speculativeexecutionisaclassicaltechniquetoimproveschedulingofnitedepen- anefunctionsofiterationvectors.theassociatedparallelexecutionorderisthusan anerelation<par,wellsuitedtoourformalframework: Eventually,allschedulefunctionscomputedbythesetechniquescanbecapturedby forone-dimensionalschedules,and 8u;v2W:u<parv()(u)<(v) formultidimensionalones. 8u;v2W:u<parv()(u)<lex(v) leadtoverybadperformance,mainlybecauseofcommunicationoverheadandcache problems.indeed,negrainparallelizationisnotsuitabletomostparallelarchitectures.4 Tiling Partitioningrun-timeinstancesisthusanimportantissue:thesolutionistogroupelementarycomputationsinordertotakeadvantageofmemoryhierarchiesandtooverlap Despitethegoodtheoreticalresultsandrecentachievements,schedulingtechniquescan communicationsandcomputations. computationpatterns[it88,cfh95,bdrr94].animportantgoaloftheseresearches istondthebesttilingstrategyrespectingmeasurecriterialikethenumberofcommunicationshappeningbetweenthetiles.thisstrategymustbeknownatcompiletimetecutedonaprocessorinanatomicway.itiswellsuitedtonestedloopswithregular Thetilingtechniquegroupselementarycomputationsintoatile,eachtilebeingex- supposeduniformwhenevaluatingtheamountofcommunications.themostusualtile modelhasbeendenedbyirigoinandtrioletin[it88];itenforcesthefollowingconstraints: Mosttilingtechniquesarelimitedtoperfectloopnests,anddependencesareoften generateecientcodeforaparticularmachine. tilesareidenticalbytranslationtoallowecientcodegenerationandautomatic tilesareboundedforlocalmemoryrequirements; tilesareatomicunitsofcomputationwithsynchronizationstepsattheirbeginning processing; 4Butitissuitableforinstruction-levelparallelism. andattheirend.
2.5.PARALLELIZATION Manydierentalgorithmshavebeendesignedtondanecienttileshapeandthento 85 techniques,dependingonthecontext.thesimplestinner-tileexecutionorderistheoriginalsequentialexecutionofelementarycomputations,butotherexecutionorders stilingalgorithms.however,inner-tilesequentialexecutionisopenforalargerscopeof partitionthenestofloops.schedulingofindividualtilesisdoneusingclassicalschedul- hierarchy,orwouldenablemoreaggressivestoragemappingoptimizationtechniques(see compatiblewiththeprogramdependences couldbemoresuitableforthelocalmemory extensivepresentationoftilingcanbefoundin[bdrr94]. Section5.3fordetails,butfurtherstudyofthisideaisleftforfuturework).Amore executionorders.thetileshapecanbeanyboundedparallelepiped(orpartofaparallelepipedoniterationspaceboundaries),butisoftenarectangleinpractice.then,the instancestoindividualtilesandtheschedulemapstilestointegersorvectorsofintegers. orderframework,withananerelation<par: Eventually,theresultofatilingtechniquecanbecapturedbyourparallelexecution <inn.nevertheless,wearenotawareoftechniquesthatwouldnotbuildaneinner-tile niquesinoutframework:theinner-tileexecutionordermustbeane.itisdenotedby Wemakeonehypothesistohandleparallelexecutionordersproducedbytilingtech- resultofatilingtechniqueisapair(t;),wherethetilingfunctiontmapsstatement foraone-dimensionalscheduleoftiles,and 8u;v2W:u<parv()(T(u))<(T(v))_(T(u)=T(v)^u<innv) (2.17) foramultidimensionalschedule. 8u;v2W:u<parv()(T(u))<lex(T(v))_(T(u)=T(v)^u<innv)(2.18) Whendealingwithnestofloops,itiswellknownthatcomplexlooptransformations 2.5.3 requirecomplexpolytopetraversals,whichslightlyincreasesexecutiontime.moreover, GeneralEciencyRemarks statementsoftengrowhugebecauseofnestedconditionalexpressions.then,thecode generatedbyastraightforwardapplicationofparallelizationalgorithmsisveryinecient. evenwhennorun-timerestorationofthedataowisrequired,theright-handsideof niques[ai91,fb98]. Movingconditionalsandsplittingloopsisveryuseful,aswellaspolytopescanningtech- eitherlimitedtonon-recursiveprogramsormuchlesseectivewithcomplexrecursive forwardsubstitution,invariantcodemotion,dead-codeelimination[asu86,muc97] are Theonlydierenceisthatmostoptimizationtechniques suchasconstantpropagation, Theseremarksnaturallyextendtorecursiveprogramsandrecursivedatastructures. techniquessuitableforrecursiveprograms. requiredmanualoptimizations.thisshouldencourageustodevelopmoreaggressive structures.inthiswork,indeed,mostexperimentationswithrecursiveprogramshave forbidanyfurtherpreciseanalysisoraggressiveprogramtransformation,especiallywhen usinggenerictypes(suchasvoid*). pointer-baseddatastructuresareconsidered.asinglepairofaliasedpointersislikelyto Ofcourse,shapeandaliasanalysesdiscussedinSection2.2.2areveryusefulwhen [HP96]arecriticalforprogramanalysisandtransformation.Itisespeciallytruefor Inductionvariabledetection[Wol92]andotherrelatedsymbolicanalysistechniques
instancewiseanalyses:computingthevalueofaninteger(orpointer)variableateach 86 CHAPTER2.FRAMEWORK instanceofastatementisthekeyinformationfordependenceanalysis.wewillindeed presentanewinductionvariabledetectiontechniquesuitableforourrecursiveprogram model. notaddressthesenecessarypreviousstagesandoptimizations: wewillalwaysconsiderthattherequiredinformationaboutdatastructureshape, Inthefollowing,whennospeciccontributionhasbeenproposedinthiswork,wewill wewillgenerateunoptimizedtransformedprograms,supposingthatclassicaloptimizationtechniquescandothejob. classicaltechniques; aliasesorinductionvariablesisavailable,whenthisinformationcanbederivedby Wemakethehypothesisthatourtechniques,ifimplementedinaparallelizingcompiler, areprecededandfollowedbytheappropriateanalysesandoptimizations.
87 Chapter3 FormalTools addressesrationalrelationsovermonoids.contributionstoaninterestingclassofrationalrelationsarefoundinsection3.4.section3.5addressesalgebraicrelations,andalso presentssomenewresults.thetwolastsectionsaremostlydevotedtoapplicabilityof aneinequalities.section3.2recallsclassicalresultsonformallanguagesandsection3.3 tion3.1isageneralpresentationofpresburgerarithmeticsandalgorithmsforsystemsof Mosttechnicalresultsonmathematicalabstractionsaregatheredinthischapter.Seccussesintersectionofrationalandalgebraicrelations,andapproximationofrelationsis thepurposeofsection3.7. formallanguagetheorytoouranalysisandtransformationframework:section3.6dis- \ondemand"whentechnicalinformationisrequiredinthefollowingchapters. mayskipallproofsandtechnicallemmas,toconcentrateonthemaintheorems.because thischapterismorea\referencemanual"formathematicalobjects,itcanalsobeenread Thereaderwhoseprimaryinterestisintheanalysisandtransformationtechniques 3.1 relationsandfunctions.thisabstractionmustalsosupportclassicalalgebraicoperations. Whendealingwithiterationvectors,weneedamathematicalabstractiontocapturesets, PresburgerArithmetics 8.TestingthesatisabilityofaPresburgerformulaisatthecoreofmostsymbolic Presburgerarithmeticsiswellsuitedtothispurpose,sincemostinterestingquestionsare computationsinvolvinganeconstraints.itisknownasintegerlinearprogrammingand equalityandinequalityofintegeraneconstraints,andrstorderquantiers9and decidablewithinthistheory.itisdenedbylogicalformulasbuildfrom:,_and^, FeautrierinPIP[Fea88b,Fea91].Inpractice,Fourier-Motzkinisveryecientonsmall super-exponentialintheworstcase,suchasthefourier-motzkinalgorithmimplemented bypughinomega[pug92]andthesimplexalgorithmwithgomorycutsimplementedby isdecidable,butnp-complete,see[sch86]fordetails.indeed,allknownalgorithmsare complexityispolynomialinthemean.computingexactsolutionstolargeintegerlinear ofpresburgerarithmeticstoautomaticparallelization. programsisanopenproblematpresent,andthisisaproblemforpracticalapplication problems,andthesimplexalgorithmismoreecientonmediumproblems,becauseits
3.1.1 88 Sets,RelationsandFunctions CHAPTER3.FORMALTOOLS relationonsetsaandbcanequivalentlybedescribedbyafunctionfromatotheset P(B)ofsubsetsofB.Noticetherangeanddomainofafunctionorrelationmaynot areseenasaspecialcaseofrelationandrelationsarealsointerpretedasfunctions:a Weconsidervectorsofintegers,andsets,functions,andrelationsthereof.Functions Presburgerformulasonintegervectorsextendedwith?. <lex,andthe\bottomelement"?denotesbydenitionanelementwhichprecedesall havethesamedimension.setsofintegervectorsareorderedbythelexicographicorder integervectors.strictlyspeaking,weconsidersets,functionsandrelationsdescribedby variablesappearininput,outputorsettuples,whereasparametersarefullyunboundand logicalformulas,whereasunknownvariablesandparametersarefreevariables.unknown variables:bound,unknownsandparameters.boundvariablesarequantiedby9and8in TodescribemathematicalobjectsinPresburgerarithmetics,weusethreetypesof interpretedassymbolicconstants.handlingparametersistrivialwithfourier-motzkin, Programming(PIP)byFeautrier[Fea88b]. butrequiredaspecicextensionofthesimplexalgorithm,calledparametricinteger resentationforanerelationscalledquasi-aneselectiontreeorquast,wherequasi-ane detailswillbeexplainedwhenneededintheexperimentalsections.pipusesanotherrep- experiments,anditssyntaxisveryclosetotheusualmathematicalone.non-intuitive Omega[Pug92]iswidelyusedinourprototypeimplementationsandsemi-automatic Denition3.1(quast)Aquasi-aneselectiontree(quast)representingananerelation1isamanylevelconditional,inwhich withintegerconstants. formsareanextensionofaneformsincludingintegerdivisionandmodulooperations predicatesaretestsforthepositivenessofquasi-aneformsintheinputvariables andleavesaresetsofvectorsdescribedinpresburgerarithmeticsextendedwith? andparameters, quastscalledwildcardvariables.thesewildcardvariablesarenotfree:theyareconstrainedinsidethequastitself.moreover,quasi-aneforms(withmoduloanddivision operations)inconditionalsandleavescanbeconvertedinto\pure"aneformsthanks toadditionalwildcardvariables,see[fea91]fordetails. vectorsthatarenotinthedomainofarelation.letusgiveafewexamples. Thefunctioncorrespondingtointegeradditioniswritten Emptysetsareallowedinleaves theydierfromthesingletonf?g todescribe Itshouldbenoticedthatboundvariablesinanerelationsappearasparametersin whichprecedesanyothervectorforthelexicographicorder. andcanberepresentedbythequastfi1+i2g f(i1;i2)!(j):i1+i2=jg onlyanefunctions,see[gc95]. 1Infact,thisisanextensionofFeautrier'sdenitiontocaptureunrestrictedanerelationsandnot
3.1.PRESBURGERARITHMETICS ThesamefunctionrestrictedtointegerslessthanasymbolicconstantNiswritten 89 andasaquast f(i1;i2)!(j):i1<n^i2<n^i1+i2=jg ifi then 1<N else?else? ifi thenfi1+i2g 2<N Therelationbetweenevennumbersiswritten (wekeepthefunctionalnotation!forbetterunderstanding,andtobecompliant withomega'ssyntax)andaquastrepresentation f(i)!(j):(9;:i=2^j=2)g (isawildcardvariable) ifi=2 thenf2:2zg else? ManyotherexamplesofquastsoccurinChapter5. handlingofthesequasts.implementationwasdonebybouletandbarthou,see[bar98] one,butitisveryusefultocodegenerationalgorithmsandverynearfromtheparametric fordetails.thequastrepresentationisneitherbetternorworsethantheclassicallogical AnewinterfacetoPIPhasbeenwritteninObjectiveCaml,allowingeasyandecient supposethatmake-quastisanalgorithmtocomputeaquastrepresentationforany anerelation.(thereverseproblemismucheasierandnotusefultoourframework.)its integerprogrammingalgorithm. extensivedescriptionisrathertechnicalbutwemaysketchtheprinciplesofthealgorithm. Toconcludethispresentationofmathematicalabstractionsforanerelations,we ThePresburgerformuladeningtheanerelationisrstconvertedtoaformwithonly existentialquantiers,bythewayofnegationoperators(atechniquealsousedinthe buildsetsofintegervectors;andeventuallythe^and_operatorsarerewritteninterms Skolemtransformationofrstorderformulas);theneveryboundvariableisreplacedbya computationsarenotdiscussedhere,see[fea88b,pd96,bar98]fordetails. newwildcardvariable;unknownvariablesareisolatedfromequalitiesandinequalitiesto ofconditionalexpressions.subsequentsimplications,sizereductionsandcanonicalform see[sch86,pd96,pug92,fea88b]. sentationsofanerelations,specicalgorithmsandapplicationstocompilertechnology, FormoredetailsonPresburgerarithmetics,integerprogramming,mathematicalrepre- Computingthetransitiveclosureofarelationisaclassicaltechniqueincomputerscience, 3.1.2 butmostalgorithmstargetrelationswhosegraphisnite.thishypothesisisobviously TransitiveClosure
notacceptableinthecaseofanerelations.theproblemisthatthetransitiveclosureof 90 CHAPTER3.FORMALTOOLS ananerelationmaynotbeananerelation;andknowingwhenitisananerelation isnotevendecidable.indeed,wecanencodethemultiplicationusingtransitiveclosure, whichisnotdenableinsidepresburgerarithmetics: itisequivalenttorr Rbeingempty. ItshouldbenotedthattestingifarelationRisclosedbytransitivityisverysimple: f(x;y)!(x+1;y+z)g=f(x;y)!(x0;y+z(x0 x)):xx0g: rathereasyintheory:thetransitiveclosurerofarelationrcanbedenedas Wearethusleftwithapproximationtechniques.Indeed,ndingalowerboundis andcomputingsnk=0rkforincreasingvaluesofnyieldsincreasinglyaccuratelower R=[k2NRk; \reasonable"valuesofntocomputealowerbound. quicklywithoutreachingtheexacttransitiveclosure.thismethodcanstillbeusedwith constantgivestheexactresultforr.butingeneral,thesizeoftheresultgrowsvery bounds.insomecases,snk=0rkisconstantforngreaterthansomevaluen0,andthis ThetransitiveclosureofRisneverthelessaverysimpleanerelation:R=f(i)! relationr=f(i)!(i+1)g,anditisevenunabletogiveanyinterestingapproximation. (i0):ii0g.moreclevertechniquesshouldthusbeusedtoapproximatetransitive Now,thepreviousiterativetechniqueisunabletondtheexacttransitiveclosureof closuresofanerelations.kellyetal.designedsuchamethodandimplementedit bounds i.e.conservativeapproximations andlowerbounds,see[kprs96]fordetails. inomega[kprs96].itisbasedonapproximatinggeneralanerelationsinasubclasswheretransitiveclosurecanbecomputedexactly.theycoinedthetermd-form (dfordierence)todenethisclass.theirtechniqueallowscomputationofbothupper 3.2 Thissectionstartswithashortreviewofbasicconcepts,thenwerecallformallanguages propertiesinterestingtoourpurpose.seethewellknownbookbyhopcroftandullman MonoidsandFormalLanguages Languages(volume1)[RS97a]fordetails. [HU79],thersttwochaptersofthebookbyBerstel[Ber79],andtheHandbookofFormal denotedbymultiplication.asemi-groupwhichhasaneutralelementisamonoid.the Asemi-groupconsistsofasetMandanassociativebinaryoperationonM,usually 3.2.1 MonoidsandMorphisms monoidstructureiswidelyusedinthiswork,withseveraldierentbinaryoperations. neutralelementofamonoidisunique,andisusuallydenotedby1mor1forshort.the GiventwosubsetsAandBofamonoidM,theproductofAandBisdenedby sub-semi-group(resp.sub-monoid)ofmifa2a(resp.a2aand1m2a).given ThisdenitionconvertsP(M)intoamonoidwithunitf1Mg.AsubsetAofMisa AB=fc2M:(9a2A;9b2B:c=ab)g:
3.2.MONOIDSANDFORMALLANGUAGES anysubsetaofm,theset A+=[n1An 91 isasub-semi-groupofm,and witha0=f1mgisasub-monoidofm.infact,a+(resp.a)istheleastsub-semi-group A=[n0An group(resp.sub-monoid)generatedbya.ifm=aforsomeam,thenaisasystem ofgeneratorsofm.amonoidisnitelygeneratedifithasanitesetofgenerators. (resp.sub-monoid)fortheorderofsetinclusioncontaininga.itiscalledthesub-semi- ofelementsofa,withn0,andwithtupleconcatenationasbinaryoperation.when Aisniteandnon-empty,itiscalledanalphabet,tuplesarecalledwords,elementsofA arecalledlettersandtheneutralelementiscalledtheemptywordanddenotedby".a ForanysetA,thefreemonoidAgeneratedbyAisdenedbytuples(a1;:::;an) thenumberofletterscomposingu.bydenition,thelengthoftheemptywordis0.for formallanguageisasubsetofafreemonoida,andthelengthjujofawordu2ais factors.theproductoftwolanguagesisalsocalledconcatenation. willalsousetheclassicalnotionsofprexes,suxes,wordreversal,sub-wordsandword aletterainanalphabeta,thenumberofoccurrencesofainaisdenotedbyjuja.we (monoid)morphism:m!m0isafunctionsatisfying Wealsorecallthedenitionofamonoidmorphism.IfMandM0aremonoids,a IfAandBaresubsetsofMand:M!M0isamorphism,then (1M)=1M0and8m1;m22M:(m1;m2)=(m1)(m2): 3.2.2 RationalLanguages (AB)=(A)(B);(A+)=(A)+;and(A)=(A): Thissectionsrecallsbasicdenitionsandresults,tosetnotationsandallowreferencein nitesetqofstates,asetiqofinitialstates,asetfqofnalstates,andanite setoftransitions(a.k.a.edges)eqaq. laterchapters. FreemonoidAisoftenremovedforcomodity,whenclearfromthecontext:wewrite GivenanalphabetA,a(nite-state)automatonA=(A;Q;I;F;E)consistsofa A=(Q;I;F;E).Atransition(q;x;q0)2Eisusuallywrittenqx state,q0isthearrivalstate,andxisthelabelofthetransition.atransitionwhoselabel is"iscalledan"-transition. Apathisaword(p1;x1;q1)(pn;xn;qn)inEsuchasqi=pi+1foralli2f1;:::;n!q0,qisthedeparting statetoanalone.anautomatonistrimwhenallitsstatesareaccessibleandmaybe 1g,andx1xniscalledthelabelofthepath.Anacceptingpathgoesfromaninitial label,andastatewithdeparting"-transitionmaynothavedepartinglabeledtransitions. isasingleletteror",atmostonetransitionmaysharethesamedepartingstateand partofanacceptingpath. ThelanguagejAjrealizedbyanite-stateautomatonAisdenedbyu2jAjiu Anautomatonisdeterministicwhenithasasingleinitialstate,everytransitionlabel labelsanacceptingpathofa.aregularlanguageisalanguagerealizedbysomenite-state automaton.
92Anyregularlanguagecanberealizedbyanite-stateautomatonwithout"-transitions CHAPTER3.FORMALTOOLS andwherealltransitionlabelsaresingleletters.anyregularlanguagecanberealizedby adeterministicnite-stateautomaton. catenationandthestaroperation. languagesoveracontainingtheemptysetandsingletons,andclosedunderunion,con- ThefamilyofrationallanguagesoveranalphabetAisequaltotheleastfamilyof Theorem3.1(Kleene)LetAbeanalphabet.ThefamilyofrationalandregularlanguagesoverAcoincides. Thefollowingwellknowntheoremisatthecoreofformallanguagetheory. morphism. undertheplusoperation,intersection,complementation,reversal,morphismandinverse Beyondtheclosurepropertiesincludedinthedenition,rationallanguagesareclosed Proposition3.1Thefollowingproblemsaredecidableforrationallanguages:membershipinlineartime,emptiness,niteness,emptinessofthecomplement,nitenessof 3.2.3 thecomplement,inclusion,equality. Werecallafewbasicfactsaboutalgebraiclanguagesandpush-downautomata.See [HU79,Ber79]foranextensiveintroduction. AlgebraicLanguages phabetaofterminalletters,analphabetvofvariables alsoknownasnon-terminals distinctfroma,andanitesetpv(v[a)ofproductions. Analgebraicgrammar a.k.a.context-freegrammar G=(A;V;P)consistsofanal- theyaregroupedtogetherusingnotation!1j2jjn. andwewriteg=(v;p).aproduction(;)2pisusuallywrittenintheform!, andif!1;2;:::;!nareproductionsofghavingthesameleft-handside, Whenclearfromthecontext,thealphabetisremovedfromthegrammardenition, derivationrelationasanextensionoftheproductionnotation!: LetAbeanalphabetandletG=(V;P)beanalgebraicgrammar.Wedenethe Then,foranyp2N,p!isthepthiterationof!,and+!and!aredenedasusual. Ingeneral,grammarsarepresentedwithadistinguishednon-terminalScalledthe f!g()92v;9u;;v2(v[a):!2p^f=uv^g=uv: axiom.thisallowstodenethelanguagelggeneratedbyagrammarg=(v;p)by context-freelanguage. AlanguageLGgeneratedbysomealgebraicgrammarGisanalgebraiclanguage a.k.a. LG=fu2A:S!ug: tions,reversal,morphism,inversemorphism,andintersectionwithrationallanguages. Indeed,algebraiclanguagesareclosedunderunion,concatenation,starandplusopera- Mostexpectedclosurepropertiesholdforalgebraiclanguages,butnotintersection. model,wepreferinthisworkanotherrepresentation. stackalphabet,anon-emptyword0in +calledtheinitialstackword,anitesetq GivenanalphabetA,apush-downautomatonA=(A; ;0;Q;I;F;E)consistsofa Althoughthemostnaturaldenitionofalgebraiclanguagescomesfromthegrammar
3.2.MONOIDSANDFORMALLANGUAGES ofstates,asetiqofinitialstates,asetfqofnalstates,andanitesetof 93 sition(q;x;g;;q0)2eisusuallywrittenqx:g! isinherited,andgiscalledthetopstacksymbol.anemptystackwordisdenotedby". transitions(a.k.a.edges)eqa Q. FreemonoidAisoftenremovedforcommodity,whenclearfromthecontext.Atran- beread,qisthecurrentstateand2 isthewordcomposedofsymbolsinthestack. Acongurationofapush-downautomatonisatriple(u;q;),whereuisthewordto!q0,thenite-stateautomatavocabulary that Thetransitionbetweentwocongurationsc1=(u1;q1;1)andc2=(u2;q2;2)isdenoted byrelation7!anddenedbyc7!c0ithereexist(a;g;;0)2a such Thenp 7!withp2N,+ u1=au2^1=0g^2=0^(q1;a;g;;q2)2e: nalstate,whenu2lithereexist(qi;qf;)2if suchthat Apush-downautomatonA=( ;0;Q;I;F;E)issaidtorealizethelanguageLby 7!and 7!aredenedasusual. Apush-downautomatonA=( ;0;Q;I;F;E)issaidtorealizethelanguageLbyempty stack,whenu2lithereexist(qi;qf)2ifsuchthat (u;qi;0) 7!(";qf;): tobeinthesetofnalstates. Noticethatrealizationbyemptystackimpliesrealizationbynitestate:qfisstillrequired (u;qi;0) 7!(";qf;"): Theorem3.2Thefamilyoflanguagesrealizedbynalstateorbyemptystackbypushdownautomataisthefamilyofalgebraiclanguagesposessomerestrictionsontheexpressivepowerandbringsaninterestingclosureproperty. Apush-downautomatonisdeterministicwhenithasasingleinitialstate,everytransition labelisasingleletteror",atmostonetransitionmaysharethesamedepartingstate,labelandtopstacksymbol,andastatewithdeparting"-transitionmaynothavedeparting Unlikenite-stateautomata,thedeterministicpropertyforpush-downautomataimtomatonwhosetransitionlabelsareeither"orasingleletter.Thefamilyoflanguages labeledtransitions. ministicalgebraiclanguages.itshouldbenoticedthatthisfamilyisalsoknownaslr(1) realizedbynalstatebydeterministicpush-downautomataiscalledthefamilyofdeter- Itisstraightforwardthatanyalgebraiclanguagecanberealizedbyapush-downau- Proposition3.2Thefamilyoflanguagesrealizedbyemptystackbydeterministicpushdownautomataisthefamilyofdeterministicalgebraiclanguageswithprexproperty. (whichisequaltolr(k)fork1)inthesyntacticalanalysisframework[asu86]. propertyisthefollowing: forbidsutobelongtol,forallwordsuandnon-emptywordsv.theinterestingclosure RecallthatalanguageLhastheprexpropertywhenaworduvbelongingtoL Proposition3.3Thefamilyofdeterministicalgebraiclanguagesisclosedundercomplementation.
94However,closureofdeterministicalgebraiclanguagesunderunionandintersectionare CHAPTER3.FORMALTOOLS Proposition3.4Thefollowingproblemsaredecidableforalgebraiclanguages:membership,emptiness,niteness. membershipinlineartime,emptinessofthecomplement,nitenessofthecomple- Thefollowingproblemsareundecidableforalgebraiclanguages:beingarational Theseadditionalproblemsaredecidablefordeterministicalgebraiclanguages: notavailable.decidabilityofdeterministicalgebraiclanguagesamongalgebraiconesis unknown,despitethenumberoftriesandrelatedworks[rs97a]. ment. language,emptinessofthecomplement,nitenessofthecomplement,inclusion(open arefrequentlyobservedinouranalysisframework[coh99a].thelukasiewiczlanguage problemfordeterministicalgebraiclanguages),equality(idem). productions -Loveranalphabetfa;bgisthelanguagegeneratedbyaxiomandthegrammarwith Weconcludethissectionwithasimplealgebraiclanguageexamplewhoseproperties TheLukasiewiczlanguageisapparentedtoDycklanguages[Ber79]andisthesimplest ofafamilyoflanguagesconstructedinordertowritearithmeticexpressionswithout!ajb: parentheses(prexor\polish"notation):theletterarepresentsabinaryoperationand brepresentstheoperand.indeed,therstwordsof-lare Proposition3.5Letw2fa;bg.Thenw2-Lijwja jwjb= 1andjuja jujb0 foranyproperleftfactoruofw(i.e.9v2fa;bg+:w=uv).moreover,ifw;w02-l, b;abb;aabbb;ababb;aaabbbb;aababbb;::: Thisimpliesthat-Lhastheprexproperty,see[Ber79]fordetails.Agraphicalrepresentationmayhelpunderstandintuitivelythepreviouspropositionandpropertiesof jww0ja jww0jb=jwja jwjb+jw0ja jw0jb: then w=aabaabbabbabaaabbbyieldsfigure3.1.a. languagebyemptystack.ithasasinglestate,whichisbothinitialandnal,asinglestack -L:drawingthegraphoffunctionu7!juja jujbasurangesovertheleftfactorsof symboli.theinitialstackwordisalsoi,itisdenotedas!iontheinitialstate.the push-downautomatoninfigure3.1.crealizes-lbynalstate.twostatesarenecessary, Eventually,Figure3.1.bshowsapush-downautomatonwhichrealizestheLukasiewicz Importantremark.Inthefollowing,everypush-downautomatonwillimplicitlyacceptwordsbynalstate. aswellastwostacksymbolszandi,theinitialstackwordbeingz. 3.2.4 Aninterestingsub-classofalgebraiclanguagesiscalledtheclassofone-counterlanguages. Itisdenedthroughpush-downautomata.Aclassicaldenitionisthefollowing:Apushdownautomatonisaone-counterautomatonifitsstackalphabetcontainsonlyoneletter. One-CounterLanguages
3.2.MONOIDSANDFORMALLANGUAGES... 95 10123 Figure3.1.a.Evolutionofoccurrencecountdierences a a b a a b b a b b a b a a a b b b b!ib;i!" a;i!ii 1!Zb;I!" 1 ";Z!Z 2 acceptingbyemptystack Figure3.1.b.Push-downautomaton ingbynalstate Figure3.1.c.Push-downautomatonaccept- a;i!ii a;z!zi Analgebraiclanguageisaone-counterlanguageifitisrealizedbyaone-counterautomaton...Figure3.1.StudyingtheLukasiewiczlanguage... counterlanguages.thisdenitionisabitmoretechnical. (bynalstate). Denition3.2(one-counterautomatonandlanguage)Apush-downautomaton However,wepreferadenitionwhichismoresuitabletoourpracticalusageofone- isaone-counterautomatonifitsstackalphabetcontainsthreeletters,z(for\zero"), I(for\increment")andD(for\decrement")andifthestackwordbelongstothe ItiseasytoshowthatDenition3.2describesthesamefamilyoflanguagesasthe realizedbyaone-counterautomaton(bynalstate). (rational)setzi+zd.analgebraiclanguageisaone-counterlanguageifitis stackwordzstandsforcountervalue0. wordzinstandsforcountervaluen,stackwordzdnstandsforcountervalue n,and ber"theoriginalsymbolinthestatename.intuitively,ifnisapositiveinteger,stack precedingclassicaldenition:theideaistoreplaceallstacksymbolsbyiandto\remem- languages,andappearsasanaturalabstractioninourprogramanalysisframework.the counterautomatonrealizingit.thisexampleintroducesspecicnotationstosimplifythe Lukasiewiczlanguageisasimpleexampleofone-counterlanguage,Figure3.2showsaone- Thefamilyofone-counterlanguagesisstrictlyincludedinthefamilyofalgebraic!n presentationofone-counterautomata: +nforn0standsforpushinginontothestackifthestackwordisinzi,andif negative,andzifnisequaltozero; standsforinitializationofthestackwordtozinisnispositive,zdnifnis
96 thestackwordiszdkitsstandsforremovingmax(n;k)symbolsthen,ifn>k, CHAPTER3.FORMALTOOLS +nforn<0standsfor ( n); pushingbackin kontothestack; nforn0standsforpushingdnontothestackifthestackwordisinzd,andifthe nforn<0standsfor+( n); stackwordiszikitsstandsforremovingmax(n;k)symbolsthen,ifn>k,pushing backdn kontothestack; =0standsfortestingifthetopstacksymbolisZ; >0standsfortestingifthetopstacksymbolisI; 6=0standsfortestingifthetopstacksymbolisnotZ; <0standsfortestingifthetopstacksymbolisD; 0standsfortestingifthetopstacksymbolisZorI; 0standsfortestingifthetopstacksymbolisZorD. Theseoperationsaretheonlyavailablemeanstocheckandupdatethecounter.Moreover, testsfor0canbeappliedbeforeadditionsorsubtractions:<0; 1standsforallowingthe forincrementingthecounterinallcases.seealsothetransitionlabeledbybonfigure3.2. transitionanddecrementingthecounterwhenthecounterisnegative,and";+1stands andeqaf";=0;6=0;>0;<0;0;0gzq. analphabet(removedwhenclearfromthecontext),c0istheinitialvalueofthecounter,... Thegeneralformforaone-counterautomatonisthus(A;c0;Q;I;F;E)],whereAis!1 b;>0; 1";=0 a;+1 1 2 tiontomulti-counterlanguages,alsocalledminskymachines[min67].thegeneralform...figure3.2.one-counterautomatonforthelukasiewiczlanguage... kthcounterandeisdenedontheproductofallstacks.however,ithasbeenshown ofn-counterautomatais(a;c10;:::;cn+0;q;i;f;e),whereck0istheinitialvalueofthe Afterthisshortpresentationofone-counterlanguages,onewouldexpectageneraliza- thattwo-counterautomatahavethesameexpressivepowerasturingmachines which guages.however,afewadditionalrestrictionsonthisfamilyoflanguageshaverecently isastrongerresultthanthewellknownequivalenceofturingmachinesandtwo-stack automata.mostinterestingquestionsthusbecomeundecidableformulti-counterlan-
beenproventoenableseveraldecidabilityresults,asfortheemptinessproblem.studying 3.3.RATIONALRELATIONS 97 Jurski[CJ98]. work,butmostinterestingapplicationswouldprobablyarisefromworkbycomonand theapplicabilityofthesenewresultstoourprogramanalysisframeworkisleftforfuture 3.3 Westartwithdenitionandbasicpropertiesofrecognizableandrationalrelations,then introducethemachinesrealizingrationaltransductions.afterstudyingsomeexamples, RationalRelations wereviewdecisionproblemsandclosureproperties.thissectionrecallsclassicalresults, see[eil74,ber79,ab88]fordetails. eratedmonoids. Werecallthedenitionandausefulcharacterizationofrecognizablesetsinnitelygen- 3.3.1 RecognizableandRationalRelations Denition3.3(recognizableset)LetMbeamonoid.AsubsetRofMisarecognizablesetifthereexistanitemonoidN,amorphismfromMtoNandasubset PofNsuchthat(R)=P. tonon-freemonodswhichpreservesthestructureofbooleanalgebra: Proposition3.6LetMbeamonoid,both?andMarerecognizablesetsinM.Recognizablesetsareclosedunderunion,intersectionandcomplementation. Recognizablesetscanbeseenasageneralizationofrational(a.k.a.regular)languages thestaroperation.butitisthecaseofrationalsets,whichextendrecognizableones. Althoughrecognizablesetsareclosedunderconcatenation,theyarenotclosedunder Theirdenitionisborrowedfromrationallanguages: Denition3.4(rationalset)LetMbeamonoid.ThefamilyofrationalsetsinMis theleastfamilyofsubsetsofmholding?andsingletonsfmgm,closedunder eralẇhentherearetwomonoidsm1andm2suchthatm=m1m2,arecognizable union,concatenationandthestaroperation. subsetofmiscalledarecognizablerelation.thefollowingresultdescribesthe\structure" However,rationalsetsarenotclosedundercomplementationandintersection,ingen- ofrecognizablerelations. Theorem3.3(Mezei)ArecognizablerelationRinM1M2isaniteunionofsetsof theformklwherek(resp.l)isarationalsetofm1(resp.m2). rationalsetswhicharerelationsbetweennitelygeneratedmonoids. ofmiscalledarationalrelation.inthefollowing,wewillonlyconsiderrecognizableor WhentherearetwomonoidsM1andM2suchthatM=M1M2,arationalsubset
98Thefollowingcharacterizationofrationalrelationsisfundamental:itallowstoexpress CHAPTER3.FORMALTOOLS Theorem3.4(Nivat)LetMandM0betwomonoids.ThenRisarationalrelationover rationalrelationsbymeansofrationallanguagesandmonoidmorphisms.(theformulationisslightlydierentfromtheoriginaltheorembynivat,see[ber79]fordetails.) MandM0ithereexistanalphabetA,twomorphisms:A!M,0:A!M0, andarationallanguagekasuchthat 3.3.2 RationalTransductionsandTransducers R=f((h);0(h)):h2Kg: relationroverm1andm2,wedeneatransductionfromm1intom2asafunctionfrom mayalsobeenextendedtoamappingfromp(m1)top(m2),andwewrite:m1!m2. M1intothesetP(M2)ofsubsetsofM2,suchthatv2(u)iuRv.Forcommodity, Werecallherea\morefunctional"viewofrecognizableandrationalrelations.Froma ductionsareclosedunderinversion(i.e.relationalsymmetry). nizable(resp.rational)relationoverm1andm2.bothrecognizableandrationaltrans- Inthenextsections,weuseeitherrelationsortransductions,dependingonthecontext. Atransduction:M1!M2isrecognizable(resp.rational)iitsgraphisarecog- retainsthebooleanalgebrastructureandtheclosureundercomposition. Thefamilywewillstudyliessomewherebetweenrecognizableandrationalrelations;it Theorem3.5(ElgotandMezei)IfA,BandCarealphabets,1:A!Band2: monoids. Thefollowingresult duetoelgotandmezei[em65,ber79] isrestrictedtofree B!Carerationaltransductions,then21:A!Cisarationaltransduction. Theorem3.6(Nivat)LetMandM0betwomonoids.Then:M!M0isarational transductionithereexistanalphabeta,twomorphisms:a!m,0:a!m0, Nivat'stheoremcanberewrittenforrationaltransductions: andarationallanguagekasuchthat seechapter4. Thesetwotheoremsarekeyresultsfordependenceanalysisanddependencetesting, 8m2M:(m)=0( 1(m)\K): Denition3.5(rationaltransducer)ArationaltransducerT=(M1;M2;Q;I;F;E) rationaltransducers;theyextendnite-stateautomatainaverynaturalway: The\mechanical"representationsofrationalrelationsandtransductionsarecalled initialstatesiq,asetofnalstatesfq,andanitesetoftransitions(a.k.a. consistsofaninputmonoidm1,anoutputmonoidm2,anitesetofstatesq,asetof writet=(q;i;f;e).sinceweonlyconsidernitelygeneratedmonoids,thetransitions MonoidsM1andM2areoftenremovedforcommodity,whenclearfromthecontext:we edges)eqm1m2q. ofatransducercanequivalentlybechoseninq0(g1[f1m1g)(g2[f1m2g)q0, whereg1(resp.g2)isasetofgeneratorsform1(resp.m2)andq0issomesetofstates largerthanq.
3.3.RATIONALRELATIONS Mostofthetime,wewillbedealingwithfreemonoids i.e.languages;theempty 99 wordisthentheneutralelementandisdenotedby". f1;:::;n 1g,and(x1xn;y1yn)iscalledthelabelofthepath.Atransduceris trimwhenallitsstatesareaccessibleandmaybepartofanacceptingpath. Apathisaword(p1;x1;y1;q1)(pn;xn;yn;qn)inEsuchasqi=pi+1foralli2 (f;g)labelsanacceptingpathoft.itisaconsequenceofkleene'stheoremthatasubset ofm1m2isarationalrelationiitisrecognizedbyarationaltransducer: ThetransductionjTjrealizedbyarationaltransducerTisdenedbyg2jTj(f)i Theorem3.7Thefollowingproblemsaredecidableforrationalrelations:whethertwo Proposition3.7Atransductionisrationaliitisrealizedbyarationaltransducer. Letusnowpresentdecidabilityandundecidabilityresultsforrationalrelations. Theorem3.8LetR,R0berationalrelationsoveralphabetsAandBwithatleasttwo However,mostotherusualquestionsareundecidableforrationalrelations. wordsareinrelation(inlineartime),emptiness,niteness. letters.itisundecidablewhetherr\r0=?,rr0,r=r0,r=ab, nitelygeneratedmonoids,butitisnotthecaseingeneral. AfewquestionsmaybecomedecidablewhenreplacingAandBbysomeparticular (AB) Risnite,Risrecognizable. following.itformalizesthefactthatarationaltransducercanbeinterpretedasanitestateautomatononamorecomplexalphabet.butbeware:bothinterpretationshave dierentpropertiesingeneral. Thefollowingdenitionwillbeusefulinsometechnicaldiscussionsandproofsinthe Denition3.6LetTbearationaltransduceroveralphabetsAandB.ThenitestateautomatoninterpretationofTisanite-stateautomatonAoverthealphabet (AB)[(Af"g)[(f"gB)denedbythesamestates,initialstates,nalstates andtransitions. 3.3.3 Weneedafewresultsaboutrationaltransductionsthatarepartialfunctions. RationalFunctionsandSequentialTransducers Denition3.7(rationalfunction)LetM1andM2betwomonoids.Arationalfunc- Card( :M1!M2isarationaltransductionwhichisapartialfunction,i.e.suchthat monoids,butwewillseearesultaboutcompositionofrationalfunctionsovernon-free MostclassicalresultsaboutrationalfunctionssupposethatM1andM2arefree (u))1forallu2m1. exponential[ber79].thefollowingresultbyblattnerandhead[bh77]showsthatitis monoidsinsection3.5.inthefollowing,however,m1andm2willbefreemonoids. AintoBisapartialfunction.However,therstalgorithmbySchutzenbergerwas decidableinpolynomialtime. GiventwoalphabetsAandB,itisdecidablewhetherarationaltransductionfrom Theorem3.9ItisdecidableinO(Card(Q)4)whetherarationaltransducerwhosesetof statesisqimplementsarationalfunction.
100Rationalfunctionshavetwoadditionaldecidableproperties: CHAPTER3.FORMALTOOLS Theorem3.10Giventworationalfunctionsfandf0fromAtoB,itisdecidable whetherff0andwhetherf=f0. \onlinecomputation"isthefollowing:itrequiresthatwhenapatheleadingtoastate qislabeledbypairofwords(u;v),andwhenaletterxisread,thereisonlyonestate ducerswhoseoutputcanbe\computedonline"withitsinput.ourinterpretationfor Amongtransducersrealizingrationalfunctions,weareespeciallyinterestedintrans- Denition3.8(inputandoutputautomata)Theinputautomaton(resp.outputautomaton)ofatransducerisobtainedbyomittingtheoutputlabel(resp.inputlabel) understoodusingthefollowingdenitions. q0andoneoutputletterysuchthat(ux;vy)labelsapathprexedbye.thisisbest Denition3.9(sequentialtransducer)LetAandBbetwoalphabets.Asequential ofeachtransition. thatithasasingleinitialstate). transducerislabeledinabanditsinputautomatonisdeterministic(whichenforces Figure3.3.a,whoseinitialstateis1issequential.Itreplacesbyathebswhichappear afteranoddnumberofbs. quentialifitcanberealizedbyasequentialtransducer.thetransducerexamplein Asequentialtransducerobviouslyrealizesarationalfunction;andafunctionisse-... aja 1 bja bjb 2 bjb aja 1abjb 2b Figure3.3.a.Sequentialtransducer aja bjb...figure3.3.sequentialandsub-sequentialtransducers... Figure3.3.b.Sub-sequentialtransducer T=(A;B;Q;I;F;E),onemayassociatea\nextstate"function:QA!Qanda closed,i.e.ifuvbelongstoitsdomainthenitisthesameforu.2toasequentialtransducer whenallthestatesofasequentialtransducerarenal,thefunctionitrealizesisprex Notethataif isasequentialfunctionand (")isdened,then(")=".moreover, T.However,thesequentialtransducerdenitionisabittoorestrictiveregardingour thesetfofnalstates,functionsandareindeedanequivalentcharacterizationof \nextoutput"function:qa!bwhosepurposeisself-explanatory.togetherwith Denition3.10(sub-sequentialtransducer)IfAandBaretwoalphabets,asubsequentialtransducer(T;)overABisapaircomposedofasequentialtransducer \onlinecomputation"property,andwepreferthefollowingextension. 2In[Ber79,Eil74],allstatesofasequentialtransducerarenal.
3.4.LEFT-SYNCHRONOUSRELATIONS ToverABwithFassetofnalstates,andofafunction:F!B.The 101 function nalstateq;inthiscase (u)isdenedithereisanacceptingpathintlabeledby(ujv)andleadingtoa realizedby(t;)isdenedasfollows:letubeawordina,thevalue ofthecomputation.asub-sequentialtransducerisobviouslyarationalfunction;anda Inotherwords,thefunctionisusedtoappendawordtotheoutputattheend (u)=v(q). functionissub-sequential:consider(q)="forallnalstatesq. functionissub-sequentialifitcanberealizedbyasequentialtransducer.asequential functionisnotsequentialbecauseallitsstatesarenalanditisnotprexclosed. thesub-sequentialtransducerinfigure3.3.bappendstoeachworditslastletter.this Thisdenitionmatchesour\onlinecomputation"property.Thefunctionrealizedby Theorem3.11Itisdecidableifafunctionrealizedbyatransducerissub-sequential, anditisdecidableifasub-sequentialfunctionissequential. ThefollowingresulthasbeenprovenbyChorutin[Cho77]. therstmaygenerateanexponentialnumberofstates;asaresult,thisdoesnotprovide tobuildasub-sequentialrealizationandasequentialrealizationarealsoprovided,but functionissub-sequential,andifasub-sequentialfunctionissequential.twoalgorithms BealandCarton[BC99b]givetwopolynomial-timealgorithmstodecideifarational edbysub-sequentialtransducersisstillsatisedforalargerclassofrationalfunctions: apolynomial-timealgorithmtodecideifarationalfunctionissequential. Denition3.11(onlinerationaltransducer)Arationaltransducerisonlineifitisa Beforeweconcludethissection,noticethatthe\onlinecomputation"propertysatisrationalfunctionandifitsinputautomatonisdeterministic.Arationaltransduction theinputautomaton,aslongasthedeterministicpropertyiskept.wearenotawareof isonlineifitisrealizedbyanonlinerationaltransducer. anyresultforthisclassofrationalfunctions,strictlylargerthantheclassofsub-sequential transductions.butifitwasdecidableamongrationalfunctions,itwouldprobablyreplace Theonlydierencewithrespecttosub-sequentialtransducersisthat"isallowedin tions. everyuseofsub-sequentialfunctionsinthefollowingapplications. sequentialfunctions,whicharedecidableinpolynomial-timeamongrationaltransduc- Inouranalysisandtransformationframework,wewillonlyuserationalandsub- criticalfordependenceanalysis.addressingtheundecidableproblemoftestingwhether 3.4 Wehaveseenthatrationalrelationsarenotclosedunderintersection,butintersectionis Left-SynchronousRelations theintersectionoftworationalrelationsisemptyornot,feautrierdesigneda\semialgorithm"fordependencetestingwhichsometimesnotterminate[fea98].becausewe approachisdierent:wearelookingforasub-classofrationalrelationswithaboolean algebrastructure(i.e.withunion,intersectionandcomplementation). wouldliketoeectivelycomputetheintersection,andnotonlytestingitsemptiness,our synchronousrelationsarenotdecidableamongrationalones,butwecoulddeneaprecise amoreexpressiveone:theclassofleft-syncrhonousrelations.wewillshowthatleft- Indeed,theclassofrecognizablerelationsisabooleanalgebra,butwehavefound
algorithmtoconservativelyapproximaterelationsintoleft-synchronousones.infact, 102 CHAPTER3.FORMALTOOLS havealreadybeenpublishedbyfrougnyandsakarovitchin[fs93].however,ourwork hasbeendoneindependentlyandbasedonadierent moreintuitiveandversatile representationoftransductions.proofsareallnew,andseveralunpublishedresultshave thispointisevenmoreinterestingforusthandecidability.manyresultspresentedhere propertiesarelostandwecouldnotdeneanypreciseapproximationalgorithmforthis relations[ps98]denedbypelletierandsakarovitch.butsomeinterestingdecidability alsobeendiscovered. class,seesection3.4.7. Noticethatalargerclasswithabooleanalgebrastructureistheclassofdeterministic la-vallee). ThisworkhasbeendoneincollaborationwithOlivierCarton(UniversityofMarne- Werecallthedenitionofsynchronoustransducers:3 3.4.1 Denition3.12(synchronism)ArationaltransduceronalphabetsAandBissynchronousifitislabeledonAB. relation. Arationalrelationortransductionissynchronousifitcanberealizedbyasyn- Denitions chronoustransducer.arationaltransducerissynchronizableifitrealizesasynchronous showedthatthereciprocalistrue:alengthpreservingrationaltransductionisrealized byasynchronoustransducer. Obviously,suchatransducerislengthpreserving;EilenbergandSchutzenberger[Eil74] Denition3.13(-synchronism)ArationaltransduceronalphabetsAandBissynchronousifeverytransitionappearinginacycleofthetransducer'sgraphislabeled Arstextensionofthesynchronouspropertyisthe-synchronousone: onab. synchronousrelation. synchronoustransducer.arationaltransduceris-synchronizableifitrealizesa- Arationalrelationortransductionis-synchronousifitcanberealizedbya realizedbya-synchronoustransducer.obviously,theboundis0whenthetransduceris showedthatthereciprocalistrue:aboundedlengthdierencerationaltransductionis synchronous.twoexamplesareshowninfigure3.4.theyrespectivelyrealizef(u;v)2 Suchatransducerhasaboundedlengthdierence;FrougnyandSakarovitch[FS93] Denition3.14(left-synchronism)ArationaltransduceroveralphabetsAandBis fa;bgfa;bg:u=vgandf(u;v)2fa;bgfcg:juja=jvjc^jujb=2g. left-synchronousifitislabeledon(ab)[(af"g)[(f"gb)andonlytransitions Then,wedenetwonewextensions: labeledonaf"g(resp.f"gb)mayfollowtransitionslabeledonaf"g(resp. f"gb). 3Itappearstobeaspecialcaseofk;l-synchronoustransducers,wherek=l=1,seeSection3.4.7. synchronoustransducer.arationaltransducerisleft-synchronizableifitrealizesa left-synchronousrelation. Arationalrelationortransductionisleft-synchronousifitisrealizedbyaleft-
3.4.LEFT-SYNCHRONOUSRELATIONS... 103 aja,bjb 1 ajc 1 bj" ajc 2 bj" ajc 3 Figure3.4.a.Asynchronoustransducer...Figure3.4.Synchronousand-synchronoustransducers... Figure3.4.b.A-synchronoustransducer Denition3.15(right-synchronism)ArationaltransduceroveralphabetsAandB isright-synchronousifitislabeledon(ab)[(af"g)[(f"gb)andonlytransitions labeledonaf"g(resp.f"gb)mayprecedetransitionslabeledonaf"g(resp. f"gb). right-synchronoustransducer.arationaltransducerisright-synchronizableifitrealizes aright-synchronousrelation. Arationalrelationortransductionisright-synchronousifitcanberealizedbya (a.k.a.orderings),where<txtissomeorderona:theprexorderf<preg,f9h2 A:f=uav^g=ubw^a<b)g. A:f=ghgandthelexicographicorderf<lexg,ff<preg_(9u;v;w2A;a;b2 Figure3.5showsleft-synchronoustransducersoveranalphabetArealizingtwoorders Inthefollowingtransducers,labelsxandystandfor8x2Aand8y2Arespectively.... xjx 1 "jy "jy "jy Figure3.5.a.Prexorder 2 "jy5 1xjy;x<txtyxjy 2xj" "jy "jy xjx xj" 3 4...Figure3.5.Left-synchronousrealizationofseveralorderrelations... Theword-reversaloperationconvertsaleft-synchronoustransducerintoaright- Figure3.5.b.Lexicographicorder synchronousoneandconversely.4thetwodenitionsarenotcontradictory:somere- lationsareleftandrightsynchronous,suchassynchronousones. 4Recognizable,synchronousand-synchronousrelationsareclosedunderword-reversal.
104Figure3.6showsatransducerrealizingtherelation=f(u;v)2AB:jujjvj CHAPTER3.FORMALTOOLS mod2g.itisneitherleft-synchronousnorright-synchronous,buttheleft-synchronousand Inthethreefollowingtransducers,labelsxandystandfor8x2Aand8y2B. right-synchronousrealizationsinthesamegureshowthatisleftandrightsynchronous.... xjy 1"jx 2 3 xj" 4 "jx "jy xjy xj" yj" (left-synchronous) 5 xyj" 1 3 (leftandrightsynchronizable) "jxy 5 "jx "jy 2 xjy yj" "jx 4 xj" 1...Figure3.6.Aleftandrightsynchronizableexample... xj" (right-synchronous) extendtoright-synchronousthroughtheword-reversaloperationandmostinteresting transducersareleft-synchronous. Inthefollowingwemostlyconsiderleft-synchronoustransducers,becauseallresults 3.4.2 Itiswellknownthatsynchronousand-synchronousrelationsareclosedunderunion, complementation,intersection.weshowthatitisthesameforleft-synchronousrelations. AlgebraicProperties Lemma3.1(Union)Theclassofleft-synchronousrelationsisclosedunderunion. Proof:LetT=(Q;I;F;E)andT0=(Q0;I0;F0;E0)beleft-synchronoustransducers. QandQ0canbesupposeddisjointwithoutlossofgenerality;andthen(Q[Q0;I[ left-synchronousrealizationoftheunion. Theproofisconstructive:giventwoleft-synchronousrealizations,onemaycomputea I0;F[F0;E[E0)realizesjTj[jT0j. Theorem3.12Recognizablerelationsareleft-synchronous. Hereisadirectapplication: Proof:LetRbearecognizablerelationinAB.FromTheorem3.3,there existsanintegern,a1;:::;an2a,andb1;:::;bn2bsuchthattau=a1 B1[[AnBn.Leti2f1;:::;ng,AA=(QA;IA;FA;EA)acceptingAi,and follows: (QAQB)[QA[QB,I=IAIB,F=FAFB[FA[FB,andEisdenedas AB=(QB;IB;FB;EB)acceptingBi.WesupposeQAandQBaredisjointsets withoutlossofgenerality anddeneatransducert=(q;i;f;e),whereq= 1.AlltransitionsinEAandEBarealsoinE;
3.4.LEFT-SYNCHRONOUSRELATIONS 105 2.IfqAx!q0A2EAandqBy!q0B2EB,then(qA;qB)xjy!(q0A;q0B)2E; 3.IfqA(resp.q0B)isanalstateandqBy!q0B2EB(resp.qAx!q0A2EA),then (qa;qb)"jy!q0b2e(resp.(qa;qb)xj"!q0a2e). Byconstruction,Tisleft-synchronous,itsinputisAianditsoutputisBi.Moreover, itacceptsanycombinationofinputwordsinaiandoutputwordsinbi.lemma3.1 terminatestheproof. Theproofisconstructive:givenadecompositionofarecognizablerelationintoproducts ofrationallanguages,onemaybuildaleft-synchronoustransducer. Anotherapplicationisthisusefuldecompositionresultforleft-synchronousrelations: Proposition3.8Anyleft-synchronousrelationcanbedecomposedintoaunionofrelationsoftheformSR,whereSissynchronousandRhaseithernoinputornooutput (Risthusrecognizable). Proof:ConsiderarelationU2ABrealizedbyaleft-synchronoustransducer T,andconsideranacceptingpatheinT.TherestrictionofTtothestatesand transitionsineyieldsatransducerte,suchasjtejjtj.morover,tecanbedivided intotransducerstsandtr,suchasthe(unique)nalstateoftherstisthe(unique) initialstateofthesecond,tsissynchronousandtrhaseithernoinputornooutput. Therfore,Terealizesaleft-synchronousrelationoftheformSR,whereSissynchronous andrhaseithernoinputornooutput.sincethenumberof\restricted"transducers Teisnite,closureunderunionterminatestheproof. Theproofisconstructiveiftheleft-synchronousrelationtobedecomposedisgivenbya left-synchronousrealization. Tostudycomplementationandintersection,weneedtwomoredenitions:unambiguityandcompletion. Denition3.16(unambiguity)ArationaltransducerToverAandBisunambiguous ifanycoupleofwordsoveraandblabelsatmostonepathint.arationalrelation isunambiguousifitisrealizedbyanunambiguoustransducer. Thisdenitioncoincideswiththeonein[Ber79]SectionIV.4forrationalfunctions, butdiersforgeneralrationaltransductions. Denition3.17(completion)ArationaltransducerTiscompleteifeverypairof wordslabelsatleastonepathint(acceptingornot). Itisobviouslynotalwayspossibletocompleteatransducerinatrimone.Fromthese twodenitions,letusrecallaverygeneralresult. Theorem3.13Theclassofacompleteunambiguousrationalrelationsisclosedunder complementation. Proof: LetRbeacompleteunambiguousrelationrealizedbytransducerT= (Q;I;F;E).WedeneatransducerT0=(Q;I;Q F;E)suchthatanacceptingpath intcannotbeoneoft0.thecompletionoftandtheuniquenessofacceptingpaths intshowsthatthecomplementationofrisrealizedbyt0.
106 CHAPTER3.FORMALTOOLS Theproofisconstructive. Now,wespecializethisresultforleft-synchronousrelations. Lemma3.2Aleft-synchronousrelationisrealizedbyanunambiguousleft-synchronous transducer. Proof:LetTbealeft-synchronoustransduceroverAandBrealizingarelationR. LetAbethenite-stateautomatoninterpretationofT overthealphabet(ab)[ (Af"g)[(f"gB) andleta0beadeterministicnite-stateautomatonaccepting thesamelanguageasa.letf;gtwowordssuchthatjtj(f)=g,andleteande0be twoacceptingpathsint. Supposeediersfrome0.Bythedeterminimproperty,thewordswandw0theyaccept ina0alsodiers;let(x;y)and(x0;y0)betherstdierence.ifx="andx06=", thedenitionofleft-synchronoustransducersimposesthatwtobelabeledinf"gb after(x;y),theneande0acceptdierentinputsint.thesamereasoningappliesto thethreeothercases y="andy06=",x0="andx6=",y0="andy6=" and yieldsdierentinputsoroutputsforpathseande0.thiscontradictsthedenitionof eande0. Thusfandgareacceptedbyauniquepathintherationaltransducerinterpretation T0ofA0.SinceA0isthedeterminizationofA,atransitionlabeledonAf"g(resp. f"gb)mayonlybefollowedbyanothertransitionlabeledonaf"g(resp.f"gb). Eventually,T0isunambiguousandleft-synchronous,anditrealizesR. Theproofisconstructive. Proposition3.9Aleft-synchronousrelationisrealizedbyacompleteunambiguousleftsynchronoustransducer. Proof:LetRbealeft-synchronousrelation.WeuseLemma3.2tocomputean unambiguousleft-synchronoustransducert=(q;i;f;e)whichrealizesr.wedene atransducert0=(q0;i;f;e0),whereqi,qoandqioarethreenewstates,q0= Q[fqi;qo;qiog,andE0isdenedasfollows: 1.AlltransitionsinEarealsoinE0. 2.Forall(x;y)2AB,qioxjy!qio2E0. 3.Forallx2A,qioxj"!qi2E0andqixj"!qi2E0. 4.Forally2B,qio"jy!qo2E0andqo"jy!qo2E0. 5.Ifq2Qissuchthat8(x0;q0)2AQ:q0x0j"!q62E,then8(y00;q00)2BQ: q"jy00!q0062e)q"jy00!qo2e0. 6.Ifq2Qissuchthat8(y0;q0)2BQ:q0"jy0!q62E,then8(x00;q00)2AQ: qx00j"!q0062e)qx00j"!qi2e0. 7.Ifq2Qissuchthat8(x0;q0)2AQ:q0x0j"!q62Eand8(y0;q0)2BQ:q0"jy0! q62e,then8(x00;y00;q00)2abq:qx00jy00!q0062e)qx00jy00!qio2e0.
3.4.LEFT-SYNCHRONOUSRELATIONS Theresultingtransducerisleft-synchronous,complete,andrealizesrelationR.Moreover,thethreelastcaseshavebeencarefullydesignedtopreservetheunambiguous 107 property:notransitiondepartingfromastateqisaddedifitslabelisalreadytheone Theorem3.14(ComplementationandIntersection)Theclassofleft-synchronous Theproofisconstructive. ofanexistingtransitiondepartingfromq. relationsisclosedundercomplementationandintersection. intersection. undercomplementation.togetherwithclosureunderunion,thisprovesclosureunder Proof:AsacorollaryofTheorem3.13andProposition3.9,wehavetheclosure algebra,whichwillbeofgreathelpfordependenceandreachingdenitionanalysis,see Eventually,wehaveproventhattheclassofleft-synchronousrelationsisaboolean butitisnottrueforleft-synchronousones.however,wehavethefollowingresult: Section4.3. Proposition3.10LetS,TandRberationalrelations. Synchronousand-synchronousrelationsareobviouslyclosedunderconcatenation, (ii)iftisleft-synchronousandrisrecognizable,thentrisleft-synchronous. (i)ifsissynchronousandtisleft-synchronous,thenstisleft-synchronous. WeuseProposition3.8topartitionTintoS1R1;:::;SnRnwhereSiissynchronous Proof: synchronoustransducers(seeproposition3.12forageneralization). Proofof(i)isastraightforwardapplicationofthedenitionofleftsynchronizablefromTheorem3.12.Applicationof(i)showsthatSiRiRisleftsynchronizable.Closureunderunionterminatestheproofof(ii). andriisrecognizableforall1in.now,ririsrecognizable,henceleft- Theproofisconstructivewhenaleft-synchronousrealizationofTisprovided,thanksto Proposition3.8.Ageneralizationof(i)isgiveninSection3.4.5. intersection.indeed,bydenitionofleft-synchronousrelations,applyingclassicalalgoactlythesamepropertiesastitself,regardingcomputationofthecomplementationand automatoninterpretation(seedenition3.6)ofaleft-synchronoustransducerthasex- Toclosethissectionaboutalgebraicproperties,oneshouldnoticethatthenite-state transducershavethesamecomplexityasfornite-stateautomataingeneral. rithmsfromautomatatheorytothenite-stateautomatoninterpretationyieldscorrectre- sultsonthetransducer.thisremarkshowsthatalgebraicoperationsforleft-synchronous 3.4.3 Synchronousand-synchronoustransductionsareclosedunderinversion(i.e.relational symmetry)andcomposition.clearly,theclassofleft-synchronoustransductionsisalso FunctionalProperties closedunderinversion. Theorem3.15Theclassofleft-synchronoustransductionsisclosedundercomposition. denitionanalysis(tosolve(4.17)insection4.3.3). Combinedwiththebooleanalgebrastructure,thefollowingresultisusefulforreaching
108 CHAPTER3.FORMALTOOLS Proof:ConsiderthreealphabetsA,BandC,twotransductions1:A!Band 1:B!C,andtwoleft-synchronoustransducersT1=(Q1;I1;F1;E1)realizing1 andt2=(q2;i2;f2;e2)realizing2.wesupposeq1andq2aredisjointsets without lossofgenerality anddenet=(q1q2[q1[q2;i1i2;f1f2[f1[f2;e)as 1.AlltransitionsinE1andE2arealsoinE; 2.Ifq1xjy!q012E1andq2yjz!q022E2,then(q1;q2)xjz!(q01;q02)2E; 3.Ifq1xj"!q012E1andq2"jz!q022E2,then(q1;q2)xjz!(q01;q02)2E; 4.Ifq1"jy!q012E1andq2yj"!q022E2,then(q1;q2)"j"!(q01;q02)2E; 5.Ifq1xjy!q012E1andq2yj"!q022E2,then(q1;q2)xj"!(q01;q02)2E; 6.Ifq1"jy!q012E1andq2yjz!q022E2,then(q1;q2)"jz!(q01;q02)2E; 7.Ifq1xj"!q012E1,then8q22F2:(q1;q2)xj"!q012E; 8.Ifq2"jz!q022E2,then8q12F1:(q1;q2)"jz!q022E. First,consideranacceptingpatheinTforacoupleofwords(f;h).Wemaywrite e=e12e0,wheree12istheq1q2partofe.byconstructionoft,theendstateof e12isanalstateoft1ande0isapathoft2,oritistheopposite.consideringthe projectionofstatesine12onq1,e12acceptsacoupleofwords(f;g)int1suchas h22(g).henceh221(f). Second,considerthreewordsf;g;hsuchasg21(f)andh22(g).Lete1bean acceptingpathfor(f;g)int1ande2beonefor(g;h)int2.supposeje1j>je2j.builda pathe12intfromtheproductofstatesandlabelsoftherstje2jtransitionsine1and e2;itsendstateis(q1;q2)withq12q1andq22f2.now,thelastje1j je2jtransitions ine1canbewritten(q1;x;";q01):e01,hencee12:((q1;q2);x;";q01):e01isanacceptingpath for(f;h)int. Eventually,wehaveshownthatTrealizes21.Now,usingtheclassical"j"- transitionremovalalgorithmfornite-stateautomata,wedenetransducert0.it isleft-synchronousbecauset1andt2are,andtransitionsinvolvingstatesofq1or Q2 labeledonaf"gorf"gc areneverfollowedbytransitionsinvolvingstates ofq1q2. Theproofisconstructive. Beforeshowinganimportantapplicationofthisresult,weneedanadditionaldenition: Denition3.18(-selection)Let:A!Bbearationaltransduction,andbe arationalorderonb i.e.arationalrelationwhichisreexive,anti-symmetricand transitive.the-selectionofisapartialfunctiondenedby 8u;v2AB:v=(u)()v=min (u): Proposition3.11Let:A!Bbealeft-synchronoustransduction,andbea left-synchronousorderonb.the-selectionofisaleft-synchronousfunction.
3.4.LEFT-SYNCHRONOUSRELATIONS Proof:LetbetheidentityrationalfunctiononB.Ifisthe-selectionof, 109 lexicographicorderfor,seesection4.3.3.formoredetailson-selection,alsoknown theproofcomesfromthefactthat= (( )) Themostinterestingapplicationofthistoourframeworkappearswhenchoosingthe asuniformization,see[ps98]. byberstelin[ber79]theorem8.4,andweuseasimilartechniquetoshowthatitisthe Itiswellknownthattherecognizabilityofatransductionisundecidable.Thisisproved 3.4.4 AnUndecidabilityResult sameforleft-synchronousrelations.westartwithapreliminaryresult. Lemma3.3LetKbeapositiveinteger,letA=fa;bg,letBbeanyalphabet,andlet u1;u2;:::;up2b.dene Then,UandU+arerationalrelations,andrelation(AB) U+isalsorational. U=f(abK;u1);(ab2K;u2);:::;(abpK;up)g: Proof:RelationUisnite,hencerational,andU+isrationalbyclosureunder concatenationandthestaroperation. theonlysubstitutionofbbybk. toprovesomethinghere.thisisdonethesamewayasin[ber79]lemma8.3,with Usually,theclassofrationalrelationsisnotclosedundercomplementation,sowehave Theorem3.16LetAandBbealphabetswithatleasttwoletters.Givenarational relationroveraandb,itisundecidablewhetherrisleft-synchronous. Considertwosequencesu1;u2;:::;upandv1;v2;:::;vpofnon-emptywordsoverB, andletkbetheirmaximumlength.dene Proof:WemayassumethatAcontainsexactlytwoletters,andsetA=fa;bg. FromLemma3.3,U,V,U+,V+,U=(AB) U+andV=(AB) V+are rationalrelations. U=f(abK;u1);:::;(abpK;up)gandV=f(abK;v1);:::;(abpK;vp)g: LetR=U[V.Sinceleft-synchronoustransductionsareclosedundercomplementation,Risleft-synchronousi(AB) R=U+\V+issider(m;u)2U+\V+.Wemaywritem=fgwithjfj=jujandjgj>0.Leftsynchronismrequiresthat(g;")labelsapathinT.Moreover,((fg)k;uk)2U+\V+ forallk1,hencethepathlabeledby(g;")mustbepartofacycle: AssumeU+\V+isnon-emptyandrealizedbyaleft-synchronoustransducerT.Con- lengthofinputandoutputwordsmustbelessthanorequaltok+1;thisiscontradictory. However,becauseu1;:::;upandv1;:::;vparenon-empty,theratiobetweenthe 9g0:8k:(fg(g0g)k;u)2U+\V+:
110Eventually,Risleft-synchronousiU+\V+isempty.5Sincedecidingthisemptiness CHAPTER3.FORMALTOOLS isexactlysolvingthepost'scorrespondenceproblemforu1;:::;upandv1;:::;vp,we Theorem3.17LetAandBbealphabetswithatleasttwoletters.Givenarational haveproventhatleft-synchronismisundecidable. Asimilarproofshowsthefollowingresult,whichisnotacorollaryofTheorem3.16. 3.4.5 relationroveraandb,itisundecidablewhetherrisleftandrightsynchronous. rationalrelationcanbeprovedleft-synchronous. Despitethegeneralundecidabilityresults,weareinterestedinparticularcaseswherea StudyingSynchronizabilityofTransducers TransmissionRate Werecallthefollowingusefulnotiontogiveanalternativedescriptionofsynchronism intransducers.thetransmissionrateofapathlabeledby(u;v)isdenedastheratio -synchronism,andtheiralgorithmoperatesdirectlyonthetransducerthatrealizesthe jvj=juj2q+[f+1g. transduction.theresultis: transducerisdecidable.frougnyandsakarovitch[fs93]showedasimilarresultfor EilenbergandSchutzenberger[Eil74]showedthatthesynchronismpropertyofa Lemma3.4Arationaltransduceris-synchronizableithetransmissionrateofallits cyclesis1. Lemma3.5Ifthetransmissionrateofallcyclesinarationaltransduceris0or+1, ofitscycles,butonemaygiveasucientcondition: Thereisnocharacterizationofrecognizabletransducersthroughthetransmissionrate +1.Consideringastrongly-connectedcomponent,allitscyclesmustbeofthesame Proof:LetTbearationaltransducerwhosecyclestransmissionratesareonly0and thenitrealizesarecognizablerelation. rate.henceastrongly-connectedcomponenthaseithernoinputornooutput.this provesthatstrongly-connectedcomponentsarerecognizable.closureofrecognizable straightforwardapplicationofpreviousdenitions,onemaygivethefollowingresult: relationsbyconcatenationandbyunionterminatestheproof. Thereisnocharacterizationofleft-synchronizabletransducerseither.However,asa Lemma3.6IfTisaleft-synchronoustransducer,thencyclesofTmayonlyhavethree followcomponentsofrate0,andonlycomponentsofrate+1mayfollowcomponents ofrate+1. componentmusthavethesametransmissionrate,onlycomponentsofrate0may dierenttransmissionrates:0,1and+1.allcyclesinthesamestrongly-connected reciprocalisavailable,seetheorem3.19. 5WehavealsoprovenherethatU+andV+arenotleft-synchronous. Evenifsynchronizabletransducersmaynotsatisfytheseproperties,somekindof
3.4.LEFT-SYNCHRONOUSRELATIONS ClassesofTransductions 111 Wehaveshownthatleft-synchronoustransductionsextendalgebraicpropertiesofrecognizabletransductions.Thefollowingtheoremshowsthattheyalsoextendreal-time Theorem3.18-synchronoustransductionsareleft-synchronous. propertiesof-synchronoustransducers. intorelationsriofconstantdelayi,forall i.lettirealizerelationri:by acceptedbyt.takingadvantageofclosureunderintersection,onemaypartitionr Proof:Considera-synchronoustransducerTrealizingarelationRoveralphabets construction,v2jtij(u)ijuj=jvj+i. AandB,andcalltheupperboundondelaysbetweeninputandoutputwords Let\ synchronizable,henceleft-synchronizable. substitutingitsnalstatebyatransduceraccepting("; islengthpreserving,hencesynchronizable.transducert0=t0 [[T0 "beanewlabel;ifiisnon-negative(resp.negative),denet0 i)(resp.( i;")).eacht0 ifromtiin LetPrealizerelationf(u;u a):u2a;a0gandqrealizerelationf(v b;v):v2 isthus i sametransductionast,anditisleft-synchronizablefromtheorem3.15. B;b0g,whicharebothleft-synchronizable.TransducerQT0Prealizesthe onlemmas3.5and3.4: OnemaygoabitfurtherandgiveageneralizationofTheorems3.12and3.18,based Theorem3.19Ifthetransmissionrateofeachcycleinarationaltransduceris0,1or +1,andifnocyclewhoserateis1followsacyclewhoserateisnot1,thenthe sideranacceptationpatheint.therestrictionofttothestatesandtransitionsine transducerisleft-synchronizable. Proof:ConsiderarationaltransducerTsatisfyingtheabovehypotheses,andcon- yieldsatransducerte,suchasjtejjtj.moreover,tecanbedividedintotransducerstsandtr,suchasthe(unique)nalstateoftherstisthe(unique)initialstateof left-synchronizablefromtheorem3.18.eventually,proposition3.10showsthatteis thesecond,andthetransmissionrateofallcyclesis1intsandeither0or+1intr. underunionterminatestheproof. left-synchronizable.sincethenumberof\restricted"transducersteisnite,closure FromLemma3.5,Trisrecognizable.FromLemma3.4,Tsis-synchronizable,hence Theproofisconstructive. Asanapplicationofthistheorem,onemaygiveageneralizationofProposition3.10.(i): Proposition3.12Ifis-synchronousandisleft-synchronous,then:isleftsynchronous. recognizable doesnotsatisfyconditionsoftheorem3.19,sincethetransmissionrateof somecyclesis2. Noticethattheleftandrightsynchronizabletransducerexamplein3.6 whichiseven
ResynchronizationAlgorithm 112 CHAPTER3.FORMALTOOLS Althoughleft-synchronismisnotdecidable,onemaybeinterestedinasynchronization algorithmthatworkonasubsetofleft-synchronizabletransducers:theclassoftransducers satisfyingthehypothesisoftheorem3.19. possiblyapproximative intersectionsofrationalrelations.presentationofthefullalgorithmandfurtherinvestigationsaboutitscomplexityareleftforfutureworkrem3.19.thistechniquewillbeusedextensivelyinsections3.6and3.7,tocompute itispossibleto\resynchronize"ourlargerclassalongthelinesoftheproofoftheo- ExtendinganimplementationbyBealandCarton[BC99a]ofthealgorithmin[FS93], 3.4.6 Werstpresentanextensionoftheminimalityconceptfornite-stateautomatatoleftsynchronoustransducers.LetT=(Q;I;F;E)beatransduceroveralphabetsAandB. DecidabilityResults Wedenethefollowingpredicate,forq2Qand(u;v)2AB: Nerode'sequivalence,noted,isdenedby Accept(q;u;v)i(u;v)labelsanacceptingpathstartingatq: Theequivalenceclassofq2Qisdenotedby^q.Let qq0iforall(u;v)2ab:accept(q;u;v)()accept(q0;u;v): where^eisnaturallydenedby T==(Q=;I=;F=;^E); synchronoustransducers. UsingNerode'sequivalence,weextendtheconceptofminimalautomatontoleft- (^q1;x;y;^q2)2^e()9(q01;q02)2^q1^q2:(q01;x;y;q02)2e: Theorem3.20Anyleft-synchronoustransductionisrealizedbyauniqueminimalleftsynchrnonoustransducer(uptoarenamingofstates). Bydenitionof,itisclearthatT=realizes. transducert=(q;i;f;e).wesupposewithoutlossofgeneralitythattistrim. Proof:LetbeatransductionoveralphabetsAandB,realizedbyaleft-synchronous f"gb);andconsider(u;v)2absuchthataccept(q;u;v)andaccept(q0;u;v). AnyoutputtransitionfromqmustbelabeledonAf"g(resp.f"gB),hencev q;q02qsuchthatqq0andqholdsaninputtransitionlabeledonaf"g(resp. EverytransitiononT=islabeledonAB[Af"g[f"gB.Considertwostates (resp.u)mustbeempty.sincethisistrueforallaccepted(u;v),andsincetistrim, provesthatt=isleft-synchronous. anyoutputtransitionfromq0mustalsobelabeledonaf"g(resp.f"gb);this Finally,letAbethenite-stateautomatoninterpretationofT(seeDenition3.6).It iswellknownthata=istheuniqueminimalautomatonrealizingthesamerational languageasa(uptoarenamingofstates).thus,ift0isanrealizationofwithas
3.4.LEFT-SYNCHRONOUSRELATIONS manystatesast=,itsnite-stateautomatoninterpretationmustbea=(upto 113 arenamingofstates)whichistheinterpretationoft=.thisprovestheunicityof becomedecidableforleft-synchronoustransductions: theminimalleft-synchronoustransducer. Asacorollaryofclosureundercomplementationandintersection,usualquestions Lemma3.7LetR,R0beleft-synchronousrelationsoveralphabetsAandB.Itis denitionabstractionsinthefollowingchapter. decidablewhetherr\r0=?,rr0,r=r0,r=ab,(ab) Risnite. synchronousones.wehavestrongargumentstoexpectapositiveresult,butnoproofat Thesepropertiesareessentialforformalreasoningaboutdependenceandreaching themoment. Eventually,wearestillworkingondecidabilityofrecognizablerelationsamongleft- 3.4.7 Wenowconsiderpossibleextensionsofleft-synchronizablerelations. FurtherExtensions Anelementaryvariationonsynchronoustransducersconsistsinenforcingasingletransmissionrateinallcycleswhichisnotnecessary1:ifkandlarepositiveintegers,a ConstantTransmissionRates synchronoustransducers. arelabeledinakbl.similarly,onemaydene-(k;l)-synchronousandleft-(k;l)- (k;l)-synchronousrelationoverabisrealizedbyatransducerwhosetransitions cyclesisnow0,+1andk=l.mixingrelationsin(k;l)-synchronousclassesfordierent foranykandl,includingk=l=1.theonlydierenceisthattransmissionratesof intoaclassicalsynchronousone,itobviouslyappearsthatthesamepropertiesaresatised Whennoticingthatachangeofthealphabetconvertsa(k;l)-synchronoustransducer (k;l)isnotallowed,ofcourse. generalleft-(k;l)-synchronoustransductions. left-(1;1)-synchronous,thatisleft-synchronous...thisstronglyreducestheusefulnessof However,mostrationaltransductionsusefultoourframework,includingorders,are DeterministicTransducers MuchmoreinterestingistheclassofdeterministicrelationsintroducedbyPelletierand Sakarovitchin[PS98]: Denition3.19(deterministictransducerandrelation)LetAandBbetwoal- (i)thereexistsapartitionofthesetofstatesq=qa[qbsuchthatthelabelofan tionshold: phabets.atransducert=(a;b;q;i;f;e)isdeterministicifthefollowingcondi- (ii)foreveryp2qandevery(x;y)2(af"g)[(f"gb),thereexistsatmostone edgedepartingfromastateinqaisinaf"gandthelabelofanedgedeparting fromastateinqbisinf"gb; deterministic); q2qsuchthat(p;x;y;q)isine(i.e.thenite-stateautomatoninterpretationis
114 (iii)thereisasingleinitialstateini. CHAPTER3.FORMALTOOLS properties:thegreatestlossisclosureundercomposition.moreover,becauserelationu+ Thisclassisstrictlylargerthanleft-synchronousrelations,andkeepsmostofitsgood Adeterministicrelationisrealizedbyadeterministictransducer. relationisrecognizable,left-synchronousorbothleftandrightsynchronous. isdeterministicintheproofoftheorem3.16,itisundecidablewhetheradeterministic realizationofarelation,ortohelpapproximatearationalrelationbyadeterministicone. deterministiconesisthatthereisnoresultsuchastheorem3.19tondadeterministic Butthemostimportantreasonforustouseleft-synchronousrelationsinsteadof sivenessthanrationalrelations:\niteautomatacannotcount",andweneedcounting 3.5 Forthepurposeofourprogramanalysisframework,wesometimesrequiremoreexpres- BeyondRationalRelations tohandlearrays!wethuspresentanextensionofthealgebraic alsoknownascontextfree propertytorelationsbetweennitelygeneratedmonoids.asonewouldexpect, Proposition3.13. theclassofalgebraicrelationsincludesrationalrelations,andretainsseveraldecidable properties.thissectionsendswithafewcontributions:theorems3.27and3.28,and downautomata(seesection3.2.3). 3.5.1 Wedenealgebraicrelationsthroughpush-downtransducers,denedsimilarlytopush- AlgebraicRelations Denition3.20(push-downtransducer)GivenalphabetsAandB,apush-down Qofstates,asetIQofinitialstates,asetFQofnalstates,andaniteset transducert=(a;b; ;0;Q;I;F;E) a.k.a.algebraictransducer consistsofa stackalphabet,anon-emptyword0in +calledtheinitialstackword,aniteset Atransition(q;x;y;g;;q0)2Eisusuallywrittenqxjy:g! FreemonoidsAandBareoftenremovedforcommodity,whenclearfromthecontext. oftransitions(a.k.a.edges)eqab Q. isthepairofwordtobeacceptedorrejected,qisthecurrentstateand2 is andrationaltransducervocabulariesareinherited. Acongurationofapush-downautomatonisaquadruple(u;v;q;),where(u;v)!q0.Thepush-downautomata thewordcomposedofsymbolsinthestack.thetransitionbetweentwocongurations ithereexist(x;y;g;;0)2ab suchthat c1=(u1;v1;q1;1)andc2=(u2;v2;q2;2)isdenotedbyrelation7!anddenedbyc7!c0 Thenp 7!withp2N,+ u1=xu2^v1=yv2^1=0g^2=0^(q1;x;y;g;;q2)2e: (u;v)2rithereexist(qi;qf;)2if suchthat Apush-downtransducerT=( ;0;Q;I;F;E)issaidtorealizetherelationR,when 7!and 7!aredenedasusual. (u;v;qi;0) 7!(";";qf;):
3.5.BEYONDRATIONALRELATIONS Apush-downtransducerT=( ;0;Q;I;F;E)issaidtorealizetherelationR,when 115 (u;v)2rithereexist(qi;qf)2ifsuchthat tobeinthesetofnalstates. Noticethatrealizationbyemptystackimpliesrealizationbynitestate:qfisstillrequired (u;v;qi;0) 7!(";";qf;"): Denition3.21(algebraicrelation)Theclassofrelationsrealizedbynalstateor damental:itallowstoexpressalgebraicrelationsbymeansofalgebraiclanguagesand byemptystackbypush-downtransducersiscalledtheclassofalgebraicrelations. usesthistheoremasadenitionforalgebraicrelationsin[ber79].) monoidmorphisms.aproofinamuchmoregeneralcasecanbefoundin[kar92].(berstel Asforrationalrelations,thefollowingcharacterizationofalgebraicrelationsisfun- Theorem3.21(Nivat)LetAandBbetwoalphabets.ThenRisanalgebraicrelation overaandbithereexistanalphabetc,twomorphisms:c!a, B,andanalgebraiclanguageLCsuchthat R=f((h); (h)):h2lg: :C! algebraicrelations. Nivat'stheoremcanbeformulatedasfollowsforalgebraictransductions: TogeneralizeSection3.3.2,algebraictransductionsarethefunctionalcounterpartof Theorem3.22(Nivat)LetAandBbetwoalphabets.Then:A!Bisan algebraictransductionithereexistanalphabetc,twomorphisms:c!a, :C!B,andanalgebraiclanguageLCsuchthat Theorem3.23Algebraicrelationsareclosedunderunion,concatenation,andthestar Letusrecallsomeusefulpropertiesofalgebraicrelationsandtransductions. 8w2A: (w)= ( 1(w)\L): operation.theyarealsoclosedundercompositionwithrationaltransductions(similar toelgotandmezeitheorem).theimageofarationallanguagebyanalgebraic buttherearesomeinterestingexceptions: Theimageofanalgebraiclanguagebyanalgebraictransductionmaynotbealgebraic, transductionisanalgebraiclanguage(thankstonivat'stheorem). Theorem3.24(Evey)Givenapush-downtransducerT,ifListhealgebraiclanguage algebraiclanguage. Thefollowingdenitionwillbeusefulinsometechnicaldiscussionsandproofsinthe realizedbytheinputautomatonoft(seedenition3.8),theimaget(l)isan following.itformalizesthefactthatapush-downtransducercanbeinterpretedasa havedierentpropertiesingeneral. push-downautomatononamorecomplexalphabet.butbeware:bothinterpretations Denition3.22LetTbeapush-downtransduceroveralphabetsAandB.ThepushdownautomatoninterpretationofTisapush-downautomatonAoverthealphabet states,initialstates,nalstatesandtransitions. (AB)[(Af"g)[(f"gB)denedbythesamestackalphabet,initialstackword,
116Amongtheusualdecisionproblems,onlythefollowingareavailableforalgebraic CHAPTER3.FORMALTOOLS Theorem3.25Thefollowingproblemsaredecidableforalgebraicrelations:whether relations: Importantremarks.Inthefollowing,everypush-downtransducerwillimplicitlyacceptwordsbynalstate.Recognizableandrationalrelationsweredenedforanynitely twowordsareinrelation(inlineartime),emptiness,niteness. AlgebraicFunctions generatedmonoids,butalgebraicrelationsaredenedforfreemonoidsonly. Denition3.23(algebraicfunction)LetAandBbetwoalphabets.Analgebraic Thereareveryfewresultsaboutalgebraictransductionsthatarepartialfunctions.Here isthedenition: thatcard(f(u))1forallu2a. However,wearenotawareofanydecidabilityresultforanalgebraictransductionto functionf:a!bisanalgebraictransductionwhichisapartialfunction,i.e.such beapartialfunction,andwebelievethatthemostlikelyanswerisnegative. tomaton: ducerswhoseoutputcanbe\computedonline"withitsinput.asforrationaltransducers, ourinterpretationfor\onlinecomputation"isbasedonthedeterminismoftheinputau- Amongtransducersrealizingalgebraicfunctions,weareespeciallyinterestedintrans- Denition3.24(onlinealgebraictransducer)Analgebraictransducerisonlineif itisapartialfunctionandifitsinputautomatonisdeterministic.analgebraic decidabilityofdeterministicalgebraiclanguagesamongalgebraiconesisunknown. Nevertheless,wearenotawareofanyresultsforthisclassofalgebraicfunctions;even transductionisonlineifitisrealizedbyanonlinealgebraictransducer. Itisdenedthroughpush-downtransducers.Aclassicaldenitionisthefollowing: Aninterestingsub-classofalgebraicrelationsiscalledtheclassofone-counterrelations. 3.5.2 One-CounterRelations Denition3.25Apush-downtransducerisaone-countertransducerifitsstackalphabet containsonlyoneletter.analgebraicrelationisaone-counterrelationifitisrealized practicalusageofone-counterrelations. Asforone-counterlanguages,wepreferadenitionwhichismoresuitabletoour byaone-countertransducer(bynalstate). Denition3.26(one-countertransducerandrelation)Apush-downtransduceris aone-countertransducerifitsstackalphabetcontainsthreeletters,z(for\zero"), I(for\increment")andD(for\decrement")andifthestackwordbelongstothe byaone-countertransducer(bynalstate). (rational)setzi+zd.analgebraicrelationisaone-counterrelationifitisrealized
3.5.BEYONDRATIONALRELATIONS ItiseasytoshowthatDenition3.26describesthesamefamilyoflanguagesasthe 117 ofone-counterrelationsisstrictlyincludedinthefamilyofalgebraicrelations. precedingclassicaldenition. machines,asformulti-counterautomata,seethelastparagraphinsection3.2.4forfurther Weusethesamenotationsasforone-counterlanguages,seeSection3.2.4.Thefamily discussionsaboutthistopic. NoticethatusingmorethanonecountergivesthesameexpressivepowerasTuring Indeed,thewellknowntheorembyElgotandMezei(Theorem3.5inSection3.3)canbe \partly"extendedtoanynitelygeneratedmonoids: analysisframeworkthatweneedtocomposerationaltransductionsovernon-freemonoids. Now,whyareweinterestedinsuchaclassofrelations?Wewillseeinourprogram Theorem3.26(ElgotandMezei)IfM1andM2arenitelygeneratedmonoids,A isanalphabet,1:m1!aand2:a!m2arerationaltransductions,then transductioncompositionisnotfree.moreprecisely,wewouldliketocomputethecompositionoftworationaltransductions21,when1:a!znand2:zn!b,for Butthisextensionisnotinterestinginourcase,sincethe\middle"monoidinour 21:M1!M2isarationaltransduction. somealphabetsaandbandsomepositiveintegern.sadly,becauseofthecommutative intuitiveviewofthiscomesfromthefactthatall\words"onzoftheform groupnatureofz,compositionof2and1isnotarationaltransductioningeneral.an areequalto0,butdonotbuildarationallanguageinf1; 1g(theybuiltacontext-free 1+1++1 {z k } 1 1 1 {z k } andtheproofgivesaconstructivewaytobuildatransducerrealizingthecomposition: one)ẇehaveproventhatsuchacompositionyieldsan-countertransductioningeneral, Theorem3.27LetAandBbetwoalphabetsandletnbeapositiveinteger.If1: A!Znand2:Zn!Barerationaltransductions,then21:A!Bisa 1andT2=(Z;B;Q2;I2;F2;E2)realize2.Wedeneaone-countertransducer Proof:Werstsupposethatnisequalto1.LetT1=(A;Z;Q1;I1;F1;E1)realize n-countertransduction. transducert0 then(q;u;";";+v;q0)2e01(nocountercheck).similarly,wedeneaone-counter 1=(A;B;0;Q1;I1;F1;E01) withnooutputonb fromt1:if(q;u;v;q0)2e1 outputoft1andt2arereplacedbycounterupdatesint0 if(q;u;v;q0)2e2then(q;";v;"; u;q0)2e02(nocountercheck).intuitively,the 2=(A;B;0;:::;cn0;Q2;I2;F2;E02) withnoinputfroma fromt2: Thenwedeneaone-countertransducerT=(A;B;0;Q1[Q2[fqFg;I1;fqFg;E) updatesint0 asakindofconcatenationoft0 2. 1andT0 1andoppositecounter ife2e01thene2e; 2: ife2e02thene2e;
118ifq12F1andq22I2then(q1;";";";";q2)2E(neithercounterchecknorcounter CHAPTER3.FORMALTOOLS ifq22f2then(q2;";";=0;";qf)2e(nocounterupdate); update); noothertransitionisine. aone-countertransducerandrecognizes21. wouldbeacceptedbyt2andthecounteriszerowhenreachingstateqf.then,tis Intuitively,Tacceptspairsofwords(u;v)when(u;")wouldbeacceptedbyT1,(";v) Finally,ifnisgreaterthan1,thesameconstructioncanbeappliedtoeachdimension ofzn,andtheassociatedcountercheckandupdatescanbecombinedtobuilda Inpractice,wewillrestrictourselveston=1applyingconservativeapproximations n-countertransducerrealizing21. Theorem3.27willbeusedinSection4.3toprovepropertiesofthedependenceanalysis. describedinsection3.7,eitheron1and2oronthemulti-countercomposition. Denition3.27(underlyingrationaltransducer)LetT=( ;0;Q;I;F;E)bea push-downtransducer. Wenowrequireanadditionalformalizationoftherationaltransducer\skeleton"ofa push-downtransducer.wecanbuildarationaltransducert0=(q;i;f;e0)fromt insetting(q;x;y;q0)2e0()9g2 ;2 :(q;x;y;g;;q0)2e: TheunderlyingrationaltransducerofTistherationaltransducerobtainedintrimming transducertrealizing21:thetransmissionrateofeverycycleintiseither0or+1. LookingattheproofofTheorem3.27,thereisaveryinterestingpropertyabout T0andremovingalltransitionslabeled"j". Proposition3.13LetAandBbetwoalphabetsandletnbeapositiveinteger.Let ThankstoLemma3.5inSection3.4,wehaveproventhefollowingresult: 1:A!Znand2:Zn!BberationaltransductionsandletTbean-counter transducerrealizing21:a!b(computedfromtheorem3.27).then,the duction,thankstothetechniquepresentedinsection3.6.2. Applicationsofthisresultincludeclosureunderintersectionwithanyrationaltrans- underlyingrationaltransduceroftisrecognizable. nestedtreesandarraysareneithermodeledbyfreemonoidsnorbyfreecommutative monoids.theirgeneralstructureiscalledafreepartiallycommutativemonoid,seesection2.3.3.letaandbbetwoalphabets,andmbesuchamonoidwithbinaryoperation.westillwanttocomputethecompositionofrationaltransductions21,when Eventually,whenstudyingabstractmodelsfordatastructures,wehaveseenthat Theorem3.28LetAandBbetwoalphabetsandletMbeafreepartiallycommutative 1:A!Mand2:M!B.ThefollowingresultisanextensionofTheorem3.27, anditsproofisstillconstructive: inm(seedenition2.6). transduction.thenumberofcountersisequaltothemaximumdimensionofvectors monoid.if1:a!mand2:m!bthen21:a!bisamulti-counter
3.6.MOREABOUTINTERSECTION Proof:Becausethefullproofisrathertechnicalwhileitsintuitionisverynatural,we 119 onlysketchthemainideas.consideringtworationaltransducerst1andt2realizing 1and2respectively,westartapplyingtheclassicalcompositionalgorithmforfree monoidstobuildatransducertrealizing21.butthistime,twillbemulticounter,everycounterisinitializedto0,andtransitionsgeneratedbytheclassical compositionalgorithmsimplyignorethecounters. Now,everytimeatransitionofT1writesavectorv(resp.T2readsavectorv),the sitionsreading(resp.writing)vectorsofthesamedimensionasvareconsideredint2 (resp.t1),andvisaddedtothecountersusingthetechniqueintheorem3.27.when \normalexecution"oftheclassicalcompositionalgorithmis\suspended",onlytran- zerobefore\resuming"the\normalexecution"oftheclassicalcompositionalgorithm. Theresultisatransducerwithrationalandmulti-counterparts,separatedbychecks aletterisreadorwrittenduringthe\suspendedmode",eachcounterischeckedfor forzero. Theorem3.28willalsobeusedinSection4.3. 3.6 Intersectingrelationsisamajorissueinouranalysisandtransformationframework.We haveseenthatthisoperationneitherpreservetherationalpropertynorthealgebraic MoreaboutIntersection specialcasesofintersections. tersection.thepurposeofthissectionistoextendthesesub-classesinordertosupport propertyofarelation;butwehavealsofoundsub-classesofrelations,closedunderin- Forthepurposeofdependenceanalysis,wehavealreadymentionedtheneedforintersectionswiththelexicographicorder.Indeed,theclassofleft-synchronousrelationsincludes 3.6.1 IntersectionwithLexicographicOrder alphabeta.wewilldescribeaclasslargerthansynchronousrelationsoveraawhich thelexicographicorderandisclosedunderintersection. isclosedunderintersectionwiththelexicographicorderonly.6 Inthissection,werestrictourselvestothecaseofrelationsoverAAforsome Denition3.28(pseudo-left-synchronism)LetAbeanalphabet.ArationaltransducerT=(A;A;Q;I;F;E)(samealphabetA)ispseudo-left-synchronousifthereexist (i)anytransitionbetweenstatesofqiislabeledxjxforsomeaina; apartitionofthesetofstatesq=qi[qs[qtsatisfyingthefollowingconditions: (iii)therestrictionofttostatesinqi[qsisleft-synchronous. (ii)anytransitionbetweenastateofqiandastateofqtislabeledxjyforsomex6=y ina; 6ThisclassisnotcomparablewiththeclassofdeterministicrelationsproposedinDenition3.19of pseudo-left-synchronoustransducer.arationaltransducerispseudo-left-synchronizable ifitrealizesapseudo-left-synchronousrelation. Arationalrelationortransductionispseudo-left-synchronousifitisrealizedbya Section3.4.7.
120Anintuitiveviewofthisdenitionwouldbethatapseudo-left-synchronoustransducer CHAPTER3.FORMALTOOLS Proposition3.14Theclassofpseudo-left-synchronousrelationsisclosedunderintersectionwiththelexicographicorder. Proof:Becausethenon-left-synchronouspartisprecededbytransitionslabeledxjy satisestheleft-synchronismpropertyeverywherebutaftertransitionslabeledxjywith x6=y.themotivationforsuchadenitioncomesfromthefollowingresult: partisdonethankstotheorem3.14. withx6=y,whicharethemselvesprecededbytransitionslabeledxjx,intersectionwith thelexicographicorderbecomesstraightforwardonthispart:ifx<ythetransition iskeptintheintersection,otherwiseitisremoved.intersectingtheleft-synchronous Proposition3.15Intersectingapseudo-left-synchronousrelationwiththeidentityrelationyieldsaleft-synchronousrelation. Anotherintersectingresultisthefollowing: removedeverytime. Proof:Sameideaastheprecedingproof,buttransitionsxjywithx6=yarenow tion,complementationandcomposition. Ofcourse,pseudo-left-synchronousrelationsareclosedunderunion,butnotintersec- labeledxjx,leavethefollowingtransitionsunchanged. left-synchronousrelations:whenatransitionlabeledxjyisfoundafterapathoftransitions Eventually,theconstructiveproofofTheorem3.19canbemodiedtolookforpseudo- 3.6.2 algebraiclanguagesunderintersectionwithrationallanguageshasnoextensiontoalgebraicrelations.still,itiseasytoseethatthereisapropertysimilartoleft-synchronism Whataboutintersectionofalgebraicrelations?Thewellknownresultaboutclosureof ThecaseofAlgebraicRelations Proposition3.16LetR1beanalgebraicrelationrealizedbyapush-downtrans- whichbringspartialintersectionresultsforalgebraicrelations. ducerwhoseunderlyingrationaltransducerisleft-synchronous,andletr2bealeft- synchronousrelation.thenr1\r2isanalgebraicrelation,andonemaycomputea isleft-synchronous. Proof:LetT1beapush-downautomatonrealizingR1whoseunderlyingrational push-downtransducerrealizingtheintersectionwhoseunderlyingrationaltransducer transducert0 TheproofcomesfromthefactthatintersectingT0 getting"theoriginalstackoperationassociatedwitheachtransitionint1.thisis duetothecross-productnatureoftheintersectionalgorithmfornite-stateautomata 1isleft-synchronous,andletT2bealeft-synchronousrealizationofR2. (whichalsoappliestoleft-synchronoustransducers). 1andT2canbedonewithout\for-
3.7.APPROXIMATINGRELATIONSONWORDS Ofcourse,thepseudo-left-synchronismpropertycanbeusedinsteadoftheleftsynchronousone,yieldingthefollowingresult: realizedbyapush-downtransducerwhoseunderlyingrationaltransducerispseudoleft-synchronous.thenintersectingrwiththelexicographicorder(resp.identityrelation)yieldsanalgebraicrelation,andonemaycomputeapush-downtransducerrealizingtheintersectionwhoseunderlyingrationaltransducerispseudo-left-synchronous 121 Proposition3.17LetAbeanalphabetandletRbeanalgebraicrelationoverAA 3.7(resp.left-synchronous). Thissectionisatransitionbetweenthelongstudyofmathematicaltoolsexposedinthis chapterandapplicationsofthesetoolstoouranalysisandtransformationframework. ApproximatingRelationsonWords information,andthatourprogramtransformationswerebasedonconservativeapproximationsofsetsandrelations.studyingapproximationsisratherunusualwhendealing withwordsandrelationsbetweenwords,butwewillshowitspracticalinterestinthe nextchapters. RememberwehaveseeninSection2.4thatexactresultswerenotrequiredfordata-ow Ourgeneralapproximationschemeforrationalandalgebraicrelationsisthustonda onlywhenaquestionoranoperationonrationaloralgebraicrelationsisnotdecidable. resultsshouldbelookedforeverytimeitispossible.indeed,approximationsareneeded Ofcourse,suchconservativeapproximationsmustbeaspreciseaspossible,andexact conservativeapproximationinasmallerclasswhichsupportstherequiredoperationor forwhichtherequiredquestionisdecidable. 3.7.1 Sometimesarecognizableapproximationofarationalrelationmaybeneeded.IfRisa ApproximationofRationalRelationsbyRecognizableRelations rationalrelationrealizedbyarationaltransducert=(q;i;f;e),thesimplestwayto inputandoutputlanguagesofr. andtodenekqi;qfastheproductofinputandoutputlanguagesoftherelationrealized buildarecognizablerelationkwhichislargerthanristodenekastheproductof by(q;fqig;fqfg;e).thenkisdenedastheunionofallkqi;qfforall(qi;qf)2if. ThisbuildsarecognizablerelationthankstoMezei'sTheorem3.3. Asmarterapproximationistoconsidereachpair(qi;qf)ofinitialandnalstatesinT, isstillrecognizable,thankstomezei'stheorem.thistechniquewillbeconsideredinthe nentintandapproximatingitwiththeprecedingtechnique.theresultingrelationk followingwhenlookingforarecognizableapproximationofarationalrelation. Thenextlevelofprecisionisachievedinconsideringeachstrongly-connectedcompo- Becauserecognizableapproximationsarenotpreciseenoughingeneral,andbecausethe 3.7.2 Relations ApproximationofRationalRelationsbyLeft-Synchronous classofleft-synchronousrelationsretainsmostinterestingpropertiesofrecognizablerelations,wewillratherapproximaterationalrelationsbyleft-synchronousones.
122ThekeyalgorithminthiscontextisbasedontheconstructiveproofofTheorem3.19 CHAPTER3.FORMALTOOLS andnoapproximationisnecessary.whenitfails,itmeansthatsomestrongly-connected presentedinsection3.4.5.inpracticalcases,itoftenreturnsaleft-synchronoustransducer algorithm. componentcouldnotberesynchronized.theideaisthentoapproximatethisstrongly connectedcomponentbyarecognizablerelation,andthentorestarttheresynchronization connectedcomponentsc1;:::;cnwhosetransmissionratesare0or+1,thenarecogniz- astrongly-connectedcomponentcwhosetransmissionrateis1followssomestrongly- not0,1or+1shouldbeapproximatedthiswayinarststage.inthesamestage,if Forbettereciency,allstrongly-connectedcomponentswhosetransmissionrateis sitionsasc,andallpathsfromc1;:::;cntocshouldnowleadtokc.applyingsucha rststageguaranteesthattheresynchronizationalgorithmwillreturnaleft-synchronous ableapproximationkcofcshouldbeaddedtothetransducerwithsameoutgoingtran- canthenbeapplied,usingtheextendedversionoftheorem3.19proposedinsection3.6. wearelookingforapseudo-left-synchronousapproximation.thesametechniqueasbefore approximationofr,thankstotheorem3.19. Eventually,whentryingtointersectarationaltransducerwiththelexicographicorder, 3.7.3 Therearetwoverydierenttechniqueswhenapproximatingalgebraicrelations.Thesimplestoneisusedtogiveconservativeresultstoafewundecidablequestionsforalgebraic ApproximationofAlgebraicandMulti-CounterRelations naltransducerasaconservativeapproximation.precisioncanbeslightlyimprovedwhen transducersthataredecidableforrationalones.itconsistsintakingtheunderlyingratio- statenames.thismayinducealargeincreaseofthenumberofstates.thesecondtechniqueisusedwhenlookingforanintersectionwithaleft-synchronousrelation:itconsists inapproximatingtheunderlyingrationaltransducerwithaleft-synchronous(orpseudo- theyareobviouslylostwhenapproximatingastrongly-connectedcomponentwitharec- canbepreservedintheresynchronizationalgorithm(associatedwiththeorem3.19),but ognizablerelation.whichtechniqueisappliedwillbestatedeverytimeanapproximation left-synchronous)onewithoutmodifyingthestackoperations.infact,stackoperations thestacksizeisbounded:thenitenumberofpossiblestackwordscanbeencodedin ofanalgebraicrelationisrequired. allunboundedcountersbutone.smartchoicesoftheremainingcounterandattemptsto thenconsistsinsavingthevalueofboundedcountersintonewstatesnames,thenremoving n-countertransductionbytheorem3.27.approximationbyaone-countertransduction Eventually,wehaveseenthatcomposingtworationaltransductionsoverZnyieldsa combinetwocountersintoonehavenotbeenstudiedyet,andareleftforfuturework.
123 Chapter4 InstancewiseAnalysisforRecursive Programs Eventhoughdependenceinformationisatthecoreofvirtuallyallmodernoptimizing foreourrecentresultsforarrays[cc98],noinstancewisereachingdenitionanalysisfor stancewisedependenceanalysisforrecursivedatastructures,lessthanthreepapershave beenpublished.evenworseisthestateoftheartinreachingdenitionanalysis:be- compilers,recursiveprogramshavenotreceivedmuchattention.whenconsideringin- andreachingdenitionanalysisattherun-timeinstancelevel.thefollowingpresentation recursiveprogramshasbeenproposed. isbuiltonourpreviousworkonthesubject[ccg96,coh97,coh99a,fea98,cc98],but hasbeengoingthroughseveralmajorevolutions.itresultsinamuchmoregeneraland ConsideringtheprogrammodelproposedinChapter2,wenowfocusondependence willshowinalaterchapter(seesection5.5)howthispreciseinformationcanbeused theoretical:welookforthehighestprecisionpossible.beyondthisimportanttarget,we mathematicallysoundframework,withalgorithmsforautomationofthewholeanalysis tooutperformcurrentresultsinparallelizationofrecursiveprograms,andalsotoenable process,butalsoinamorecomplexpresentation.theprimarygoalofthisworkisrather newprogramtransformationtechniques. deferredtothenextsections.eventually,section4.7comparesourresultswithstatic techniqueispresentedinsection4.3,withquestionsspecictoparticulardatastructures variableandstoragemappingfunctioncomputationinsection4.2,thegeneralanalysis Westartourpresentationwithafewmotivatingexamples,thendiscussinduction 4.1 analysesandwithrecentworksoninstancewiseanalysisforloopnests. Studyingthreeexamples,wepresentanintuitiveavoroftheinstancewisedependence andreachingdenitionanalysesforrecursivecontrolanddatastructures. MotivatingExamples hereinfigure4.1.awithapartialcontroltree. OurrstexampleisstilltheprocedureQueens,presentedinSection2.3.Itisreproduced 4.1.1 FirstExample:ProcedureQueens instancesofprogramstatements.letusstudyinstancefpiaaaaaajqpiaabbrofstate- StudyingaccessestoarrayA,ourpurposeistonddependencesbetweenrun-time
124... inta[n]; CHAPTER4.INSTANCEWISEANALYSISFORRECURSIVEPROGRAMS PvoidQueens(intn,intk){ A=A=afor(inti=0;i<n;i++){ IB=B=b r if(k<n){ Js if(){ for(intj=0;j<k;j++) Q =A[j]; A[k]=; IAAaAaA FP }}} Queens(n,k+1); FPIAAJs F}intmain(){ FPIAAaAJs FPIAAaAaAJs sss JJJ writea[0] QP Figure4.1.a.ProcedureQueens } Queens(n,0); FPIAAaAaAJQPIAABBr J IAA FBB Figure4.1.b.Compressedcontroltree r readsa[0] mentr,depictedasastarinfigure4.1.b.inordertondsomedependences,wewould...figure4.1.procedurequeensandcontroltree... FPIAAaAaAJQPIAABBris0,soFPIAAaAaAJQPIAABBrreadsA[0]. liketoknowwhichmemorylocationisaccessed.sincejisinitializedto0instatementb,andincrementedby1instatementb,weknowthatthevalueofvariablejat statementq.thus,instancessuchasfpiaajs,fpiaaaajsorfpiaaaaaajswriteinto A[k],weareinterestedinthevalueofvariablek:itisinitializedto0inmain(bythe rstcallqueens(n,0)),andincrementedateachrecursivecalltoprocedurequeensin Wenowconsiderinstancesofs,depictedassquares:sincestatementswritesinto A[0],andarethereforeindependencewithFPIAAaAaAJQPIAABBr. doesexecute:sinceweassumethatfpiaaaaaajqpiaabbrexecutes,thenfpiaaaaaaj is,amongthethreepossiblereachingdenitionsthatareshown,thelasttoexecute.andit againatfigure4.1.b,wenoticethatinstancefpiaaaaaajs denotedbyablacksquare LetusnowderivewhichofthesedenitionsreachesFPIAAaAaAJQPIAABBr.Looking (hencefpiaaaaaajs)hastoexecute.therefore,otherinstanceswritinginthesame denitionoffpiaaaaaajqpiaabbr.wewillshowlaterhowthissimpleapproachto scouldexecuteafterfpiaaaaaajs,wecanensurethatfpiaaaaaajsisthereaching theirvalueisalwaysoverwrittenbyfpiaaaaaajs.1noticingthatnootherinstanceof arrayelement,suchasfpiaajsandfpiaaaajs,cannotreachthereadinstance,since computingreachingdenitionscanbegeneralized. 1FPIAAaAaAJsisthencalledanancestorofFPIAAaAaAJQPIAABBr,tobeformallydenedlater.
4.1.MOTIVATINGEXAMPLES 4.1.2 SecondExample:ProcedureBST 125 LetusnowlookatprocedureBST,asshowninFigure4.2.Thisprocedureswapsnode childofthenodepointedbyp;p->valuedenotestheintegervalueofthenode. valuestoconvertabinarytreeintoabinarysearchtree(bst).nodesofthetreestructure arereferencedbypointers;p->l(resp.p->r)denotesthepointertotheleft(resp.right) PvoidBST(tree*p){ I1... LI2 a if(p->l!=null){ if(p->value<p->l->value){ BST(p->l); cb t=p->value; }} p->l->value=t; p->value=p->l->value; RJ1 if(p->r!=null){ LP J2 ed if(p->value>p->r->value){ BST(p->r); f } p->r->value=t; t=p->value; } p->value=p->r->value; I1 I1 PFPJ1RP I2 J1 F}intmain(){ aab I2 b cc ddj2 J2 eeff...figure4.2.procedurebstandcompressedcontrolautomaton... } if(root!=null)bst(root); andbetweenthereadaccessinubandinstanceuc.itisthesameforaninstancevof areanti-dependencesbetweentherstreadaccessinuandinstanceub,betweenthe secondreadaccessinuanduc,betweenthereadaccessininstanceuaandinstanceub, TherearefewdependencesonprogramBST.IfuisaninstanceofblockI2,thenthere blockj2:thereisananti-dependencebetweentherstreadaccessinuandue,between?istheuniquereachingdenitonofeachreadaccess. accessinueanduf.nootherdependencesarefound.wewillshowinthefollowinghow thereadaccessinuanduf,betweenthereadaccessinudandue,andbetweentheread tocomputethisresultautomatically.eventually,areachingdenitionanalysistellsthat 4.1.3 OurlastmotivatingexampleisfunctionCount,asshowninFigure4.3.Itoperateson theinodestructurepresentedinsection2.3.3.thisfunctioncomputesthesizeofale ThirdExample:FunctionCount inblocks,incountingterminalinodes. thecountprogram(notconsideringtheotherdatastructures,suchasscalarc).how- Sincethereisnowriteaccesstotheinodestructure,therearenodependenceson
126 PintCount(inode*p){... CHAPTER4.INSTANCEWISEANALYSISFORRECURSIVEPROGRAMS beai else{ if(p->terminal) L=L=l c=0; for(inti=0;i<p->length;i++) returnp->length; cd }intmain(){ } returnc; c+=count(p->n[i]); PFP IaaI bbll EE L dd F} Count(file); cp ll ever,aninterestingresultforcacheoptimizationtechniques[td95]wouldbethateach...figure4.3.procedurecountandcompressedcontrolautomaton... automaticallybyouranalysistechniques. memorylocationisreadonlyonce.wewillshowthatthisinformationcanbecomputed Intherestofthischapter,weformalizetheconceptsintroducedabove.InSection4.2, 4.1.4 wecomputemapsfrominstancenamestodata-elementnames.then,thedependence WhatNext? 4.2 andreachingdenitionsrelationarecomputedinsection4.3. andareferenceinthestatement tomemorylocations.toabstracttheeectofevery InSection2.4,wedenedstoragemappingsfromaccesses i.e.pairsofarun-timeinstance MappingInstancestoMemoryLocations statementinstance,weneedtomakeexplicitthesefunctions.thisisdonethroughthe useofinductionvariables. thatinductionvariablesaredescribedbysystemsofrecurrenceequations,weprovea fundamentalresolutiontheoremforsuchsystems,andnallyweapplythistheoreminan algorithmtocomputestoragemappings. Afterafewdenitionsandadditionalrestrictionsoftheprogrammodel,weshow existingprogramvariable,and\v"isanabbreviationfor\thevalueofvariable\v". Tosimplifythenotationsofvariablesandvalues,wewrite\v"forthenameofan loops torecursiveprograms.tosimplifytheexposition,wesupposethateveryinteger Wenowextendtheclassicalconceptofinductionvariable stronglyconnectedwithnested 4.2.1 InductionVariables distinctivename.thisallowsquickandnon-misleadingwordingssuchas\variablei", orpointervariablethatislocaltoaprocedureorglobaltotheprogramhasaunique
4.2.MAPPINGINSTANCESTOMEMORYLOCATIONS andhasnoeectonthegeneralityoftheapproach.comparedtoclassicalworkswith 127 nestsofloops[wol92],wehavearatheroriginaldenitionofinductionvariables: integerargumentsofafunctionthatareinitializedtoaconstantortoaninteger integerloopcountersthatareincremented(ordecremented)byaconstantateach ateachprocedurecall; inductionvariableplusconstant(e.g.incrementedordecrementedbyaconstant), pointerargumentsthatareinitializedtoaconstantortoapossiblydereferenced loopiteration; pointerloopvariablesthataredereferencedateachloopiteration; Forexample,supposei,jandkareintegervariables,pandqarepointervariables pointerinductionvariable,ateachprocedurecall; toaliststructurewithamembernextoftypelist*,andcomputeissomeprocedure withtwoarguments.inthecodeinfigure4.4,reference2*i+jappearsinanon-recursive functioncall,hencei,j,pandqareconsideredinductionvariables.ontheopposite,k isnotaninductionvariablebecauseitretainsitslastvalueattheentryoftheinnerloop.... voidcompute(inti,list*p){ list*q; intj,k; for(q=p,k=0;q!=null;q=q->next) for(j=0;j<100;j+=2,k++) //recursivecall } printf("%d",2*i+j); Compute(j+1,q); casesofdirectassignmentstoinductionvariablesareallowed i.e.inductionvariable...figure4.4.firstexampleofinductionvariables... updatesoutsideofloopiterationsandprocedurecalls.regardinginitializationandincrement/decrement/dereference,therulesarethesamethanforaprocedurecall,but Asakindofsyntacticsugartoincreasetheversatilityofinductionvariables,some directassignmentscanbeinterpretedas\executedattheentryofthatblock",thename therearetwoadditionalrestrictions.theserestrictionsarethoseofthecodemotion ofthestatementbeingreplacedbytheactualnameoftheblock. assignmenttosomeloop/procedureblocksurroundingit.aftersuchatransformation, [KRS94,Gup98]andsymbolicexecutiontechniques[Muc97]usedtomoveeachdirect convertedintoaforlooponi,butjisnotaninductionvariablesinceitisnotinitialized theprograminfigure4.5.a,iisaninductionvariablebecausethewhileloopcanbe intolegalinductionvariableupdates,asshownbythefollowingexamples.considering Ofcourse,symbolicexecutiontechniquescannotconvertallcasesofdirectassignations iisnotaninductionvariablebecausesisguardedbyaconditional. attheentryoftheinnerforloop.consideringtheotherprograminfigure4.5.b,variable
128... inti=0,j=0,k,a[200]; CHAPTER4.INSTANCEWISEANALYSISFORRECURSIVEPROGRAMS while(i<10){ for(k=0;k<10;k++){ r } A[i]=A[i]+A[j]; ; j=j+2; inti,a[10,10]; for(i=0,j=0;i<10;i++){ if() sfigure4.5.a.secondexample } i=i+1; sr } A[i,j]=; i=i+2;...figure4.5.moreexamplesofinductionvariables... Figure4.5.b.Thirdexample programmodelpresentedinsection2.2,ouranalysisrequiresafewadditionalhypotheses: AdditionalrestrictionstotheprogrammodelIncomparisonwiththegeneral everydatastructuresubjecttodependenceorreachingdenitionanalysismustbe everyarraysubscriptmustbeananefunctionofintegerinductionvariables(not allocationsandstacks); declaredglobal(noticethatlocalvariablescanbemadeglobalusingexplicitmemory everytreeaccessmustdereferenceapointerinductionvariable(notanypointer anyintegervariable)andsymbolicconstants; 4.2.2variable)oraconstant. mustbeabletoassociatememorylocationstomemoryreferencesinstatementinstances Describingconictsbetweenmemoryaccessesisatthecoreofdependenceanalysis.We BuildingRecurrenceEquationsonInductionVariables acontrolwordtotheassociatedvalueoftheinductionvariable.inaddition,thenext (i.e.a[i],*p,etc.)bymeansofstoragemappings.thisanalysisisdoneindependently denitionintroducesanotationfortherelationbetweencontrolwordsandinduction oneachdata-structure.foreachinductionveriable,wethusneedafunctionmapping Denition4.1(valueofinductionvariables)Letbeaprogramstatementor variablevalues. block,andwbeaninstanceof.thevalueofvariableiatinstancewisdened asthevalueofiimmediatelyafterexecuting(resp.entering)instancewofstatement (resp.block).thisvalueisdenotedby[i](w). Weconsiderpairsofelementsinmonoids,andtobeconsistentwiththeusualnotation allpairs(u;i)suchthat[i](u)=i,forallinstancesuof. Foraprogramstatementandaninductionvariablei,wecall[i;]thesetof forrationalsetsandrelations,apair(x;y)willbedenotedby(xjy). Indeed,anexecutiontracekeepsalltheinformationaboutvariableupdates,butnota Ingeneral,thevalueofavariableatagivencontrolworddependsontheexecution.
4.2.MAPPINGINSTANCESTOMEMORYLOCATIONS controlword.however,duetoourprogrammodelrestrictions,inductionvariablesare 129 completelydenedbycontrolwords: Lemma4.1Letibeaninductionvariableanduastatementinstance.Ifthevalue executesornot thenvisaprexofu. Proof:Simplyobservethatonlyloopentries,loopiterationsandprocedurecallsmay [i](u)dependsontheeectofaninstancev i.e.thevaluedependsonwhetherv modifyaninductionvariable,andthatloopentriesareassociatedwithinitialisations which\kill"theeectofallpreceedingiterations(associatedwithnon-prexcontrol mappingsfeandfe0coincidesonae\ae0.thisstrongpropertyallowstoextendthe words). Fortwoprogramexecutionse;e02E,theconsequenceofLemma4.1isthatstorage thusconsiderinthefollowingastoragemappingfindependentontheexecution. extension,allstoragemappingsfordierentexecutionsofaprogramcoincides.wewill computationofastoragemappingfetothewholesetaofpossibleaccesses.withthis tions: Lemma4.2Let(Mdata;)bethemonoidabstractionoftheconsidereddatastructure. Thefollowingresultstatesthatinductionvariablearedescribedbyrecurrenceequa- valueifiiscapturedbyoneofthefollowingequations: Considerastatementandaninductionvariablei.Theeectofstatementonthe whereinducisthesetofallinductionvariablesintheprogram,includingi. either92mdata;j2induc: or92mdata: 8u2Lctrl:[i](u)=[j](u) 8u2Lctrl:[i](u)= (4.1) (4.2) onlytwoways: tions,edgecorrespondstoastatementintheprogramtextthatcanmodifyiin Proof:Consideranedgeinthecontrolautomaton.Duetooursyntacticalrestric- eitherthereexistaninductionvariablejwhosevalueisj2mdatajustbefore valueofiafterexecutinginstanceuisj translationfromapossiblyidentical variable; executinginstanceuofstatementandaconstant2mdatasuchthatthe orthereexistaconstant2mdatasuchthatthevalueofiafterexecutinginstance uis initialization. inductionvariables.thereforewealsobuildequationsonanefunctionsa(i,j,) Noticethat,whenaccessingarrays,weallowgeneralanesubscriptsandnotonly equationson[2i+j k](u)knowingthat[2i+j k](u)=2[i](u)+[j](u) oftheinductionvariables.forexample,ifa(i,j,k)=2*i+j-kthenwehavetobuild and[k](u)isnotpossibleingeneral:variablesi,jandkmayhavedierentscopes. [k](u).2 2Wehaveindeedtogeneratenewequations,sincecomputing[2i+j k](u)from[i](u),[j](u)
130Tobuildsystemsofrecurrentequationsautomatically,weneedtwoadditionalnota- tions: Undefinedisapolymorphicvalueforinductionvariables,[i](w)=Undefinedmeans CHAPTER4.INSTANCEWISEANALYSISFORRECURSIVEPROGRAMS Arg(proc;num)standsforthenumthactualargumentofprocedureproc. isnotvisibleatinstancew; thatvariableihasanundenedvalueatinstancew;itmayalsobethecasethati program. Recurrence-Build(program) AlgorithmRecurrence-BuildappliesLemma4.2inturnforeachstatementinthe 1sys program:anintermediaterepresentationoftheprogram returnsalistofrecurrenceequations 43doforeachinductionvariableiin 2foreachstatementinprogram doswitch? 7658 case=for(;;i=i+inc)://loopiteration case=for(i=init;;)://loopentry sys[f8u2lctrl:[i](u)=initg 10 11 9 case=for(;;i=i->inc)://loopiteration sys 12 case=proc( sys[f8u2lctrl:[i](u)=[i](u)incg 13 case=proc( sys sys[f8u2lctrl:[arg(proc;m)](u)=[var](u)g {z} {z} m 1,var,): 14 15 case=proc( sys sys[f8u2lctrl:[arg(proc;m)](u)=[var](u)cstg {z} m 1,var+cst,): 16 17 case=proc( sys sys[f8u2lctrl:[arg(proc;m)](u)=[var](u)cstg {z} m 1,var->cst,): 18 19 casedefault: sys[f8u2lctrl:[arg(proc;m)](u)=cstg m 1,cst,): 20 22 23 21 doform foreachprocedurepdeclaredproc(type1arg1,,typenargn)in dosys sys 1ton sys[f8up2lctrl:[argm](up)=[arg(proc;m)](u)g sys[f8u2lctrl:[i](u)=[i](u)g 24returnsys Transposedto[i;] thesetofallpairs(uj[i](u)) itsaysthat constant2mdatasuchthat[i](u)=[j](u)isanequationgeneratedbylemma4.2. Now,supposethatthereexistastatement,twoinductionvariablesiandj,anda forallstatements0thatmayprecedeinavalidcontrolwordu.second,supposethat thereexistastatement,aninductionvariablesi,andaconstant2mdatasuchthat (ujj)2[j;0]=)(ujj)2[i;];
4.2.MAPPINGINSTANCESTOMEMORYLOCATIONS [i](u)=isanequationgeneratedbylemma4.2.transposedto[i;],itsaysthat 131 forallstatements0thatmayprecedeinavalidcontrolwordu.thesetwoobservationsallowtobuildanewsysteminvolvingequationsonsets[i;]fromtheresultof (uji)2[i;0]=)(uj)2[i;]; renceequationsoftheform[i]()=[j](")([j](")isanundenedvalue)or[i]()=, Recurrence-Build.AlgorithmtoachievethisiscalledRecurrence-Rewrite:the andthetwoloopson0considerpredecessorsof. twoconditionalsinrecurrence-rewriteareassociatedwithu=",i.e.withrecur- Recurrence-Rewrite(program;system) 1Lctrl returnsarewrittensystemofrecurrenceequations system:asystemofrecurrenceequationsproducedbyrecurrence-build program:anintermediaterepresentationoftheprogram 4doif2Lctrl 2new 3foreachequation8u2Lctrl:[i](u)=[j](u)insystem? languageofcontrolwordsofprogram 5678foreachequation8u2Lctrl:[i](u)=insystem foreach0suchthat(ctrl0\lctrl)6=? donew thennew new[f8u2lctrl:(ujj)2[j;0])(ujj)2[i;]g new[f(jj)2[i;]g 10 11 9doif2Lctrl 13returnnew 12 foreach0suchthat(ctrl0\lctrl)6=? donew thennew new[f8u2lctrl:(uji)2[i;0])(uj)2[i;]g new[f(j)2[i;]g procedurequeens.therearethreeinductionvariables,i,jandk;butvariableiisnot usefulforcomputingstoragemappingfunctions.wegetthefollowingequations: AlgorithmsRecurrence-BuildandRecurrence-Rewritearenowappliedto FromentryBofloopB=B=b:8uB2Lctrl:[j](uB)=0 FromrecursivecallQ:8uQ2Lctrl:[Arg(Queens;2)](uQ)=[k](u)+1 FromprocedureP:8uP2Lctrl:[k](uP)=[Arg(Queens;2)](u) FrommaincallF:[Arg(Queens;2)](F)=0 Allotherstatementsletinductionvariablesunchangedorundened: FromiterationbofloopB=B=b:8ub2Lctrl:[j](ub)=[j](u)+1 8uP2Lctrl:[j](uP)=Undefined 8uI2Lctrl:[j](uI)=Undefined [j](f)=undefined 8uA2Lctrl:[j](uA)=Undefined 8ua2Lctrl:[j](ua)=Undefined 8uQ2Lctrl:[j](uQ)=Undefined 8uJ2Lctrl:[j](uJ)=[j](u) 8uB2Lctrl:[j](uB)=[j](u) 8ur2Lctrl:[j](ur)=[j](u) 8us2Lctrl:[j](us)=Undefined
132 CHAPTER4.INSTANCEWISEANALYSISFORRECURSIVEPROGRAMS 8uA2Lctrl:[k](uA)=[k](u) 8uI2Lctrl:[k](uI)=[k](u) [k](f)=undefined 8ua2Lctrl:[k](ua)=[k](u) 8uB2Lctrl:[k](uB)=[k](u) 8uQ2Lctrl:[k](uQ)=[k](u) 8uJ2Lctrl:[k](uJ)=[k](u) 8ur2Lctrl:[k](ur)=[k](u) 8ub2Lctrl:[k](ub)=[k](u) [j](u)=j(resp.[k](u)=k),forallinstancesuofastatement.fromequations Now,recallthat[j;](resp.[k;])isthesetofallpairs(ujj)(resp.(ujk))suchthat 8us2Lctrl:[k](us)=[k](u) above,recurrence-rewriteyields: 8><>: 8uP2Lctrl:(ujj)2[j;Q])(uPjUndefined)2[j;P] 8uP2Lctrl:(ujj)2[j;F])(uPjUndefined)2[j;P] 8uI2Lctrl:(ujj)2[j;P])(uIjUndefined)2[j;I] (FjUndefined)2[j;F] 8uA2Lctrl:(ujj)2[j;A])(uAjUndefined)2[j;A] 8uA2Lctrl:(ujj)2[j;I])(uAjUndefined)2[j;A] 8uB2Lctrl:(ujj)2[j;A])(uBj0)2[j;B] 8uB2Lctrl:(ujj)2[j;B])(uBjj)2[j;B] 8ua2Lctrl:(ujj)2[j;A])(uajUndefined)2[j;a] 8uA2Lctrl:(ujj)2[j;a])(uAjUndefined)2[j;A] 8uJ2Lctrl:(ujj)2[j;A])(uJjUndefined)2[j;J] 8ur2Lctrl:(ujj)2[j;B])(urjj)2[j;r] 8ub2Lctrl:(ujj)2[j;B])(ubjj+1)2[j;b] 8uB2Lctrl:(ujj)2[j;b])(uBjj)2[j;B] 8uQ2Lctrl:(ujj)2[j;J])(uQjUndefined)2[j;Q] 8us2Lctrl:(ujj)2[j;J])(usjUndefined)2[j;s]
4.2.MAPPINGINSTANCESTOMEMORYLOCATIONS 8><>: (FjUndefined)2[k;F] 133 8uP2Lctrl:(ujx)2[Arg(Queens;2);F])(uPjx)2[k;P] 8uP2Lctrl:(ujx)2[Arg(Queens;2);Q])(uPjx)2[k;P] 8uI2Lctrl:(ujk)2[k;P])(uIjk)2[k;I] 8uA2Lctrl:(ujk)2[k;A])(uAjk)2[k;A] 8uA2Lctrl:(ujk)2[k;I])(uAjk)2[k;A] 8uB2Lctrl:(ujk)2[k;A])(uBjk)2[k;B] 8uB2Lctrl:(ujk)2[k;B])(uBjk)2[k;B] 8uA2Lctrl:(ujk)2[k;a])(uAjk)2[k;A] 8ua2Lctrl:(ujk)2[k;A])(uajk)2[k;a] 8uB2Lctrl:(ujk)2[k;b])(uBjk)2[k;B] 8uQ2Lctrl:(ujk)2[k;J])(uQjk)2[k;Q] 8uJ2Lctrl:(ujk)2[k;A])(uJjk)2[k;J] 8ur2Lctrl:(ujk)2[k;B])(urjk)2[k;r] 8ub2Lctrl:(ujk)2[k;B])(ubjk)2[k;b] 8us2Lctrl:(ujk)2[k;J])(usjk)2[k;s] 4.2.3 SolvingRecurrenceEquationsonInductionVariables 8uQ2Lctrl:(ujk)2[k;J])(uQjk+1)2[Arg(Queens;2);Q] (Fj0)2[Arg(Queens;2);F] Thefollowingresultisatthecoreofouranalysistechnique,butitisnotlimitedtothis purpose.itwillbeappliedinthenextsectiontothesystemofequationsreturnedby Recurrence-Rewrite. Lemma4.3ConsidertwomonoidsLandMwithrespectivebinaryoperationsand?. LetRbeasubsetofLMdenedbyasystemofequationsoftheform (E1) (E2) 8l2L;m12M: 8l2L;m22M:(ljm2)2R2=)(l2j2)2R; (ljm1)2r1=)(l1jm1?1)2r wherer1lmandr2lmaresomesetvariablesconstrainedinthesystem (possiblyequaltor),1;2areconstantsinland1;2areconstantsinm.then, andm,intoexpressionsinthemonoidlm.thenoursecondtaskistoderiveset Risarationalset. Proof:OurrsttaskistoconverttheseexpressionsonunstructuredelementsofL expressionsinlm,oftheformsetconstantsetorconstantset(theinduced operationisdenotedby\").indeed,theright-hand-sideof(e1)canbewritten Thus,(E1)gives (ljm1)(1j1)2r: Theright-hand-sideof(E2)canalsobeenwritten (lj")(2j2)2r R1(1j1)R: but(lj")isneitheravariablenoraconstantoflm.
134Toovercomethisdiculty,wecallR"thesetofallpairs(lj")suchthat9m2M: CHAPTER4.INSTANCEWISEANALYSISFORRECURSIVEPROGRAMS (ljm)2r.itisclearthatr"satisesthesameequationsasrwithallrightpair Atlast,iftheonlyequationsonRare(E1)and(E2),wehave membersreplacedby".now,(e2)yieldstwoequations: R"2(2j")R" R"=R1(1j")+R"2(2j") and R"("j2)R: Moregenerally,applyingthisprocesstoR1,R2andtoeverysubsetofLMdescribed inthesystem,wegetanewsystemofregularequationsdeningr.itiswellknown R=R1(1j1)+R"("j2) codedaslistsofequations),andtostringoperationconcat(equationsareencoded ThankstoclassicallistoperationsInsert,DeleteandMember(systemsareen- thatsuchequationsdenearationalsubsetoflm. asstrings),algorithmrecurrence-solvegivesanautomaticwaytosolvesystemsof equationsoftheform(e1)or(e2). Recurrence-Solve(system) 1sets returnsalistofregularexpressions system:alistofrecurrenceequationsoftheform(e1)and(e2) 3doInsert(sets;fA(j)Bg) 2foreachimplication\(ljm)2A)(ljm?)2B"insystem 4 Insert(sets;fA"(j")B"g)? 75foreachimplication\(ljm)2A)(lj)2B"insystem 6doInsert(sets;fB"("j)Bg) 10doifMember(variables;B) 8variables 9foreachinclusion\A(xjy)B"insets Insert(sets;fA"(j")B"g) 11 thenequation? 14variables 12 13 elseinsert(variables;\b=a(xjy)") Insert(variables;Concat(equation;\+A(xjy)")) Compute-Regular-Expressions(variables) Delete(variables;B) 15returnvariables whenthevariableinleft-handsidedoesnotappearinright-handside orkleenestar systemisseenasaregulargrammarandresolutionisdonethroughvariablesubstitution betweenrationalsets,thenreturnsalistofregularexpressionsdeningthesesets.the AlgorithmCompute-Regular-Expressionssolvesasystemofregularequations see[hu79]fordetails. insertion whenitdoes.wellknownheuristicsareusedtoreducethesizeoftheresult, tocomputethevalueofinductionvariablesatcontrolwords. Themainresultofthissectionfollows:wecansolverecurrenceequationsinLemma4.2 4.2.4 ComputingStorageMappings Theorem4.1ThestoragemappingfthatmapseverypossibleaccessinAtothememorylocationitaccessesisarationalfunctionfromctrltoMdata.
4.2.MAPPINGINSTANCESTOMEMORYLOCATIONS Proof:Sincearraysubscriptsareanefunctionsofintegerinductionvariables,and 135 sincetreeaccessesaregivenbydereferencedinductionpointers,onemaygeneratea systemofequationsaccordingtolemma4.2(orrecurrence-build)foranyread Rewrite,thissystemisrewrittenintermsofequationsonsetsofpairs(uj[i](u)), orwriteaccess. Theresultisasystemofequationsoninductionvariables.ThankstoRecurrence- whereuisacontrolwordandiisaniterationvariable,describingthevalueof iforanyinstanceofstatement.wethusgetanewsystemwhichinductively describessubset[i;]ofctrlmdata.becausethissystemsatisesthehypotheses oflemma4.3,wehaveproventhat[i;]isarationalsetofctrlmdata.now,for of buildarationalset.hencefisarationaltransductionfromctrltomdata. Becausefisalsoapartialfunction,itisarationalfunctionfromctrltoMdata. agivenmemoryreferencein,weknowthatpairs(wjf(w)) wherewisaninstance Theproofisconstructive,thankstoRecurrence-BuildandRecurrence-Solve, andcompute-storage-mappingsisthealgorithmtoautomaticallycomputestorage regularexpressions realizingtherationalstoragemappingsforeachreferenceinrighthandside. isalistofrationaltransducers convertedbycompute-rational-transducerfrom mappingsforarecursiveprogramsatisfyingthehypothesesofsection4.2.1.theresult Compute-Storage-Mappings(program) 1system returnsalistrationaltransducersrealizingstoragemappings program:anintermediaterepresentationoftheprogram 4newlist 3list 2new Recurrence-Solve(new) Recurrence-Rewrite(program;system) Recurrence-Build(program) 5foreachregularexpressionreginlist 6donewlist 7returnnewlist? LetusnowapplyCompute-Storage-MappingsonprogramQueens.Startingfrom newlist[compute-rational-transducer(reg) theresultofrecurrence-rewrite,weapplyrecurrence-solve.justbeforecallingcompute-regular-expressions,wegetthefollowingsystemofregularequations:
136 CHAPTER4.INSTANCEWISEANALYSISFORRECURSIVEPROGRAMS 8><>: [j;f]=(fjundefined) [j;p]=[j;f](pjundefined)+[j;q](pjundefined) [j;i]=[j;p](ijundefined) [j;a]=[j;i](ajundefined) [j;a]=[j;a](ajundefined)+[j;a](ajundefined) [j;a]=[j;a](ajundefined) [j;b]=[j;b]"("j0) [j;b]=[j;b](bj0)+[j;b](bj0) [j;b]=[j;b](bj1) [j;r]=[j;b](rj0) [j;j]=[j;a](jjundefined) [j;q]=[j;j](qjundefined) [j;s]=[j;j](sjundefined) [j;f]"=(fj") [j;p]"=[j;f]"(pj0)+[j;q]"(pj0) [j;i]"=[j;p]"(ij0) [j;a]"=[j;i]"(aj0) [j;a]"=[j;a]"(aj0)+[j;a]"(aj0) [j;a]"=[j;a]"(aj0) [j;b]"=[j;a]"(bj0) [j;b]"=[j;b]"(bj0)+[j;b]"(bj0) [j;b]"=[j;b]"(bj0) [j;j]"=[j;a]"(jj0) [j;q]"=[j;j]"(qj0) 8><>: [k;f]=(fjundefined) [k;p]=[arg(queens;2);f](pj0)+[arg(queens;2);q](pj0) [k;i]=[k;p](ij0) [k;a]=[k;i](aj0) [k;a]=[k;a](aj0)+[k;a](aj0) [k;a]=[k;a](aj0) [k;b]=[k;a](bj0) [k;b]=[k;b](bj0)+[k;b](bj0) [k;b]=[k;b](bj0) [k;r]=[k;b](rj0) [k;j]=[k;a](jj0) [k;q]=[k;j](qj0) [k;s]=[k;j](sj0) [Arg(Queens;2);F]=(Fj0) [Arg(Queens;2);Q]=[k;J](Qj1) Thesesystems seenasregulargrammars canbesolvedwithcompute-regular- Expressions,yieldingregularexpressions.Theseexpressionsdescriberationalfunctions fromctrltoz,butweareonlyinterestedin[j;r]and[k;s](accessestoarraya): [j;r]=(fpiaaj0) (JQPIAAj0)+(aAj0)(BBj0)(bBj1)(rj0) (4.3) [k;s]=(fpiaaj0) (JQPIAAj1)+(aAj0)(Jsj0) (4.4)
4.2.MAPPINGINSTANCESTOMEMORYLOCATIONS Eventually,wehavefoundthestoragemappingfunctionforeveryreferencetothearray: (urjf(ur;a[j])) 137 (usjf(us;a[k])) =(FPIAAj0) (JQPIAAj0)+(aAj0)(BBj0)(bBj1)(rj0)(4.5) 4.2.5 ApplicationtoMotivatingExamples =(FPIAAj0) (JQPIAAj1)+(aAj0)(Jsj0) (4.6) repeattheprocessforthetwoothermotivatingexamples. ProcedureBST WehavealreadyappliedCompute-Storage-MappingsonprogramQueens,andwe AlgorithmCompute-Storage-MappingsisnowappliedtoprocedureBSTinFigure4.2.Theonlyinductionvariableisp: FromsecondrecursivecallR:8uR2Lctrl:[Arg(BST;1)](uR)=[p](u)r: FromrstrecursivecallL:8uL2Lctrl:[Arg(BST;1)](uL)=[p](u)l FromprocedureBST:8uP2Lctrl:[k](uP)=[Arg(BST;1)](u) FrommaincallF:[Arg(BST;1)](F)=" equationsabove,thissetsatisfythefollowingregularequations: setofallpairs(ujp)suchthat[p](u)=p,forallinstancesuofastatement.from Allotherstatementslettheinductionvariableunchanged.Recallthat[p;]isthe 8><>: [p;p]=(fpj")+[p;i1](lpjl)+[p;j1](rpjr) [p;i1]=[p;p](i1j") [p;j1]=[p;p](j1j") [p;i2]=[p;i1](i2j") [p;j2]=[p;j1](j2j") [p;a] [p;b] [p;c] [p;d] =[p;i2](bj") =[p;i2](cj") =[p;i2](aj") ThissystemdescribesrationalfunctionsfromctrltoZ,butweareonlyinterestedin [p;e] [p;f]=[p;j2](fj") =[p;j2](dj") [p;]for2fi2;a;b;c;j2;d;e;fg(accessestonodevalues): =[p;j2](ej") Eventually,wecancomputethestoragemappingfunctionforeveryreferencetothetree: 82fJ2;d;e;fg:[p;]=(FPj") (I1LPjl)+(J1RPjr)(J1J2j") 82fI2;a;b;cg:[p;]=(FPj") (I1LPjl)+(J1RPjr)(I1I2j") (4.7) (ujf(u;p->value)) 82fI2;a;bg: (4.8) (ujf(u;p->l->value)) 82fI2;b;cg: =(FPj") (I1LPjl)+(J1RPjr)(I1I2j") =(FPj") (I1LPjl)+(J1RPjr)(I1I2jl) (4.10) (4.9) (ujf(u;p->r->value)) (ujf(u;p->value)) 82fJ2;d;eg: 82fJ2;e;fg: =(FPj") (I1LPjl)+(J1RPjr)(J1J2j")(4.11) =(FPj") (I1LPjl)+(J1RPjr)(J1J2jr)(4.12)
138 FunctionCount CHAPTER4.INSTANCEWISEANALYSISFORRECURSIVEPROGRAMS AlgorithmCompute-Storage-MappingsisnowappliedtoprocedureCountinFigure4.3.Variablepisatreeindexandvariableiisanintegerindex.Indeed,theinode Ldata,withbinaryoperationdenedinSection2.3.3.Butnosuchvariableappearsin Thus,theeectiveinductionvariableshouldcombinebothpandiandbeinterpretedin theprogram...thereasonisthatthecodeiswritteninc,inwhichtheinodestructure cannotbereferencedthroughauniform\cursor" likeatreepointerorarraysubscript. structureisneitheratreenoranarray:nodesarenamedinthelanguageldata=(zn)z. IPintCount(inode&p){... bea else{ if(p->terminal) L=L=l c=0; for(inti=0,inode&q=p->n;i<p->length;i++,q=q->1) returnp->length; cd }main(){ } returnc; c+=count(q); F...Figure4.6.ProcedureCountandcontrolautomaton... } Count(file); twoinductionvariables: operation->hasbeenredenedtoemulatearrayaccesses.3referencespandqarethe inac++-likesyntaxinfigure4.6.now,pisac++referenceandnotapointer,and Thiswouldbecomepossibleinahigher-levellanguage:wehaverewrittentheprogram FromentryLofloopL=L=l:8uL2Lctrl:[q](uL)=[p](u)n Fromrecursivecallc:8uc2Lctrl:[Arg(Count;1)](uc)=[q](u) FromprocedureP:8uP2Lctrl:[p](uP)=[Arg(Count;1)](u) FrommaincallF:[Arg(Count;1)](F)=" [p;](resp.[q;])isthesetofallpairs(ujp)(resp.(ujq))suchthat[p](u)=p(resp. Allotherstatementsletinductionvariablesunchangedorundened.Recallthat FromiterationlofloopL=L=l:8ul2Lctrl:[q](ul)=[q](u)1 [q](u)=q),forallinstancesuofastatement.fromequationsabove,thesesetssatisfy 3Yes,C++isbothhigh-levelanddirty!
4.3.DEPENDENCEANDREACHINGDEFINITIONANALYSIS thefollowingregularequations: 139 8><>: [p;p]=(fpj")+[q;l](cpj") [p;e]=[p;p](ej") [p;i]=[p;p](ij") [p;a]=[p;i](aj") [p;b]=[p;e](bj") [p;l]=[p;e](lj") [p;l]=[p;l](lj")+[p;l](llj") [p;d]=[p;e](dj") [q;p]=(fjundefined)+[q;l](cpjundefined) [q;a]=[q;i](ajundefined) [q;e]=[q;p](ejundefined) [q;i]=[q;p](ijundefined) [q;b]=[q;e](bjundefined) [q;l]=[p;e](ljn) [q;l]=[q;l](lj0)+[q;l](llj1) interestedin[p;i],[p;a]and[p;l](accessestoinodevalues): Thesesystemsdescriberationalfunctionsfromctrlto(Zn)Z,butweareonly [q;d]=[q;e](djundefined) [p;a]=(uajf(ua;p->length)) [p;i]=(uijf(ui;p->terminal)) =(FPj") (ELLjn)(lLj1)(cPj")(Ij") =(FPj") (ELLjn)(lLj1)(cPj")(Iaj") (4.13) [p;l]=(ulljf(ull;p->length)) =(Fj") (ELLjn)(lLj1)(cPj")(ELj") (4.14) 4.3 DependenceandReachingDenitionAnalysis (4.15) thatstoragemappingsarerationaltransductions.basedonthisresult,wewillnowpresent Whenallprogrammodelrestrictionsaresatised,wehaveshownintheprevioussection denitionsanddetailscanbefoundinchapter3. Bothclassicalresultsandrecentcontributionstoformallanguagestheorywillbeuseful, ageneraldependenceandreachingdenitionanalysisschemeforrecursiveprograms. arrays)fortechnicalquestionsdependingonthedatastructurecontext. inourprogrammodel.seesections4.4(trees),4.5(arrays)and4.6(nestedtreesand Thissectiontacklesthegeneraldependenceandreachingdenitionanalysisproblem problemsarisingwhencomputingdependencerelations.wethuspresentageneralcomputationschemefortheconictrelation,buttechnicalissuesandprecisestudyisleftfor 4.3.1 InSection2.4.1,wehaveseenthatanalysisofconictingaccessesisoneoftherst BuildingtheConictTransducer therationallanguageofcontrolwords.letmdatabethemonoidabstractionforagiven thenextsections. Weconsideraprogramwhosesetofstatementlabelsisctrl.LetLctrlctrlbe
140 datastructuredusedintheprogram,andldatamdatabetherationallanguageofvalid CHAPTER4.INSTANCEWISEANALYSISFORRECURSIVEPROGRAMS datastructureelements. conictrelationeisdenedby Nowbecausefisusedinsteadoffe(itisindependentontheexecution),theexact whichisequivalentto 8e2E;8u;v2Lctrl:uev()(u;v2Ae)^f(u)=f(v); BecausefisarationaltransductionfromctrltoMdata,f 1isarationaltransduction frommdatatoctrl,andmdataiseitherafreemonoid,orafreecommutativemonoid, 8e2E;8u;v2Lctrl:uev()(u;v2Ae)^v2f 1(f(u)): one-countertransductions. orafreepartiallycommutativemonoid,weknowfromtheorems3.5,3.27and3.28that f 1fiseitherarationaloramulti-countercountertransduction.Theresultwillthus beexactinalmostallcases:onlymulti-countertransductionsmustbeapproximatedby andtheonlyapproximationofaewecanuseisthefulllanguagea=lctrlofcontrol words.eventually,theapproximateconictrelationwecomputeisthefollowing: over,guardsofconditionalsandloopboundsarenottakenintoaccountforthemoment, Wecannotcomputeanexactrelatione,sinceAedependsontheexecutione.More- Inallcases,wegetatransducerrealization(rationalorone-counter)oftransduction. Thisrealizationisoftenunapproximateonpairsofcontrolwordswhichareeectively 8u;v2Lctrl:uvdef ()v2f 1(f(u)): (4.16) executed. whethertwopointersarealiased[deu94,ste96],andemptimessisdecidableforrational analysis,consideringthefactthatisoftenunapproximateinpractice. andalgebraictransductions(seechapter3).thisisanimportantapplicationofour Onemayimmediatelynoticethattestingforemptinessofisequivalenttotesting computerelationusingthesametechniques.however,ageneralapproximationscheme forfunctionfhasnotbeendesigned,andfurtherstudyisleftforfuturework. functions:ifarationaltransductionapproximationoffwasavailable,onecouldstill Noticealsothatthiscomputationofdoesnotrequireaccessfunctionstoberational 4.3.2 Tobuildthedependencetransducer,weneedrsttorestrictrelationetopairsofwrite accessesorreadandwriteaccesses,andthentointersecttheresultwiththelexicographic BuildingtheDependenceTransducer order<lex: 8e2E;8u;v2Lctrl: approximationofe.relationisrealizedbyarationaltransducerinthecaseoftrees ThankstotechniquesdescribedinSection3.6.2,wecanalwayscomputeaconservative uev()u e\((ww)[(wr)[(rw))\<lexv: andbyaone-countertransducerinthecaseofarraysornestedtreesandarrays. theintersectionitself.theintersectionmayindeedbeapproximateinthecaseoftrees andnestedtreesandarrays,becauserationalrelationsarenotclosedunderintersection Approximationsmayeithercomefromthepreviousapproximationofeorfrom
4.3.DEPENDENCEANDREACHINGDEFINITIONANALYSIS (seesection3.3).butthankstoproposition3.13itwillalwaysbeexactforarrays.more 141 stepsmaybeimplementeddierently. giveageneraldependenceanalysisalgorithmforourprogrammodel.thedependence- Analysisalgorithmisexactlythesameforeverykindofdatastructure,butindividual detailsineachdatastructurecasecanbefoundinsections4.4,4.5and4.6.wecannow Dependence-Analysis(program) 1f returnsadependencerelationbetweenallaccesses program:anintermediaterepresentationoftheprogram 3ifisamulti-countertransduction 2 4 (f 1f) Compute-Storage-Mappings(program) 7 5iftheunderlyingrationaltransducerofisnotleft-synchronous 6 then one-counterapproximationof 9return 8 \<lex \((WW)[(WR)[(RW)) resynchronizationwithorwithoutapproximationof theunionforallthedatastructuresinvolved. structure.togetthefulldependencerelationoftheprogram,itisnecessarytocompute TheresultofDependence-Analysisislimitedtodependencesonaspecicdata 4.3.3 RemembertheformaldenitioninSection2.4.2:theexactreachingdenitionrelationis denedasalexicographicselectionofthelastwriteaccessindependencewithagiven FromDependencestoReachingDenitions readaccess,i.e. Inthecaseofanexactknowledgeofe,andwhenthisrelationisleft-synchronous,one Clearly,thismaximumisuniqueforeachreadaccessuinthecourseofexecution. 8e2E;8u2Re: e(u)=max <lexfv2we:veug: seesection3.4.3. mayeasilycomputeanexactreachingdenitionrelation,usinglexicographicselection, boundshavenotbeentakenintoaccount:theresultisthatmanynon-existingaccesses rarelyapplicable.moreover,usingthecomputationschemeabove,conditionalsandloop areconsidereddependentforrelation.weshouldthusbelookingforaconservative Theproblemisthateisnotknownpreciselyingeneral,andtheabovesolutionis writevmaybeindependencewithuwithoutbeingexecutedbytheprogram,andsecond,allwriteswhicharenoteectivelyinconictwithumaybeconsideredaspossible onmakescomputationoffrom(4.17)almostimpossible,fortworeasons:rst,a approximationofe,builtontheavailableapproximatedependencerelation.relying dependences. whenatleastoneofthefollowingconditionsissatised. Supposewecanprovethatsomestatementinstancedoesnotexecute,andthatthis However,weknowwecancomputeanapproximatereachingdenitionrelationfrom informationcanbeinsertedintheoriginaltransduction:someowdependencescan beremoved.theremaininginstancesaredescribedbypredicateemay(w)(instances thatmayexecute).
142Ontheopposite,ifwecanprovethatsomeinstancewdoesexecute,andifthis CHAPTER4.INSTANCEWISEANALYSISFORRECURSIVEPROGRAMS areeectivelyexecutedaredescribedbypredicateemust(w)(instancesthatmust ware\killed":theycannotreachaninstanceusuchthatwu.instancesthat execute). informationcanbeinsertedintheoriginaltransduction,thenwritesexecutingbefore Eventually,onemayhavesomeinformationeconditional(v;w)aboutaninstancesw Themoreprecisethepredicatesemay,emustandeconditional,themoreprecisethereaching isusedthesamewayastheformerpredicateemust. thatdoesexecutewheneveranotherinstancevdoes:this\conditional"information tionalstatementsandloopbounds.thisinformationisofcoursecriticalwhentryingto denitionrelation.insomecases,onemayevencomputeanexactreachingdenition buildpredicatesemay,emustandeconditional.retrievingthisinformationcanbedoneusing boththeresultsofinductionvariableanalysis(seesection4.2)andadditionalanalyses Now,rememberallourworksinceSection4.2hascompletelyignoredguardsincondi- ofthevalueofvariables[ch78,mas93,mp94,tp95].suchexternalanalyseswouldfor examplecomputeloopandrecursioninvariants. simplestructuralanalysisoftheprogram,whichconsistsinexploitingeveryinformation hiddenintheprogramsyntax: Anothersourceofinformation mostlyforpredicateeconditional isprovidedbya inawhileconstruct,assumingsomeinstanceofastatementdoesexecute,all inaifthenelseconstruct,eitherthethenortheelsebranchis instancesprecedingitinthewhileloopalsoexecute; executed; Noticethiskindofstructuralanalysiswasalreadycriticalfornestedloops[BCF97,Bar98, inasequenceofnon-guardedstatements,allinstancesofthesestatementsaresimultaneouslyexecutedornot; Won95]. Denition4.2(ancestor)ConsideranalphabetctrlofstatementlabelsandalanguageLctrlofcontrolwords.Wedeneunco:asubsetofctrlmadeofallblock denition: Anotherveryimportantstructuralpropertyisdescribedwiththefollowingadditional wr2lctrl(aninstanceofr).ifv2unco(withoutlabelsofconditionalstatements) labelswhicharenotconditionalsorloopblocks,andall(unguarded)procedurecall labels,i.e.blockswhoseexecutionisunconditional. issuchthatuvs2lctrl,thenuvsiscalledanancestorofwr. Letrandsbetwostatementsinctrl,andletubeastrictprexofacontrolword Thisdenitionisbestunderstoodonacontroltree,suchastheoneinFigure4.1.b page124:blacksquarefpiaaaaaajsisanancestoroffpiaaaaaajqpiaabbr,butnot ThesetofancestorsofaninstanceuisdenotedbyAncestors(u). graysquaresfpiaaaajsandfpiaajs.now,observetheformalancestordenition: 1.executionofwrimpliesexecutionofu,becauseitisinthepathfromtherootof thecontroltreetonodewr;
4.3.DEPENDENCEANDREACHINGDEFINITIONANALYSIS 2.executionofuimpliesexecutionofuvs,becausevismadeofdeclarationblocks 143 Wethushavethefollowingresult: only,withoutconditionalstatements. Proposition4.1Ifaninstanceuexecutes,thenallancestorsofualsoexecute.This canbewrittenusingpredicatesemustandeconditional: Atlast,wecandeneaconservativeapproximationofthereachingdenitionrelation,builton,emay,emustandeconditional: 8u2R:(u)=v2(u):emay(v)^ @w2(u):v<lexw^(emust(w)_econditional(v;w)_econditional(u;w) 8u2Lctrl:emust(u)=)emust(Ancestors(u)): econditional(u;ancestors(u)); algebraicoperationsinvolvedin(4.17).when,inaddition,relationisleft-synchronous, Predicatesemay,emust,econditionalshoulddenerationalsets,inordertocomputethe :(4.17) futurework,andwewillonlyconsiderafew\rules"usefulinourpracticalexamples. matecomputationofwith(4.17). closureunderunion,intersection,complementation,andcomposition,allowsunaproxi- However,designingageneralcomputationframeworkforthesepredicatesisleftfor Insteadofbuildingautomataforpredicatesemay,emustandeconditionalthencomputing 4.3.4 from(4.17),wepresentafewrewritingrulestorenethesetsofpossiblereaching PracticalApproximationofReachingDenitions denitions,startingfromaveryconservativeapproximationofthereachingdenition relation:therestrictionofdependencerelationtoowdependences(i.e.fromawriteto areadaccess).thistechniqueislessgeneralthansolving(4.17),butitavoidscomplex andapproximate algebraicoperations. tractedbyexternalanalyses,suchasanalysisofcontitionalexpressions,detectionofin- variants,orstructuralanalysis.inthesection4.5,wewilldemonstratepracticalusageof Applicabilityoftherewritingrulesisgovernedbythecompile-timeknowledgeex- theseruleswhenapplyingourreachingdenitionanalysisframeworktoprogramqueens. ofthissection. renesetsofpossiblereachingdenitionsamonginstancesofs.reningsetsofpossible reachingdenitionswhichareinstancesofseveralstatementswillbediscussedattheend Forthemoment,wechooseastatementswithawritereferencetomemory,andtryto ThevpaProperty(ValuesareProducedbyAncestors) in-depthexplorationswherevaluesareproducedbyancestors.thisbehaviorisalso Thispropertycomesfromthecommonobservationaboutrecursiveprogramsthat\values stronglyassessedbyscoperulesoflocalvariables. areproducedbyancestors".indeed,alotofsort,tree,orgraph-basedalgorithmsperform vpa()8e2e;u2re;v2we: v=e(u)=)v2ancestors(u):
144Sinceallpossiblereachingdenitionsareancestorsoftheuse,rulevpaconsists CHAPTER4.INSTANCEWISEANALYSISFORRECURSIVEPROGRAMS itsassociatedrewritingruleisnotgiven. inremovingalltransitionsproducingnon-ancestors.formally,alltransitions0js.t. 0<txtand06=sareremoved. TheokaProperty(OneKillingAncestor) Wemaydeneoneotherinterestingpropertyusefultoautomaticpropertychecking; killsallpreviouswritessinceitdoesexecutewhenudoes. Ifitcanbeproventhatatleastoneancestorvsofareaduisindependencewithu,it PropertyChecking oka()8u2r: (u)6=?=)(9v2ancestors(u):v2(u)): withusmayfollowed accordingtothelexicographicorder byanon-ancestorinstance ingforpropertyvpaisdicult,butwemayrelyonthefollowingresult:whenproperty okaholds,checkingvpaisequivalenttocheckingwhetheranancestorvsindependence Propertyokacanbediscoveredusinginvariantpropertiesoninductionvariables.Check- windependencewithus. relevantrewritingruleforeachone. Otherpropertiescanbeobtainedbymoreinvolvedanalyses:theproblemistonda ands2,itisnotpossibleingeneraltodecidewhetheronemay\kill"theotherwithout therewritingrules.designingruleswhichhandletheglobalowoftheprogramisabit moredicult.whencomparingpossiblereachingdenitioninstancesoftwowritess1 Now,rememberwerestrictedourselvestooneassignationstatementswhenpresenting problemisthustointersecttworationaloralgebraicrelations,whichcannotbedone aspecictransducer(rationalorone-counter,dependingonthedatastructure).the storagemappingsfors1ands2areverysimilar,andexactresultscanbeeasilycomputed. withoutapproximationsingeneral,seesections3.6and3.7.inmanycases,however, and3.7.theintersectionwithrfw:emay(w)ginthethirdlineservesthepurposeof denitionanalysisinsideourprogrammodel.algebraicoperationsonsetsandrelations inthesecondloopofthealgorithmmayyieldapproximativeresults,seesections3.4,3.6 TheReaching-Definition-Analysisalgorithmisageneralalgorithmforreaching Definition-AnalysisalgorithmisappliedtoprogramQueensinSection4.5. restrictingthedomaintoreadaccessesandtheimagetowriteswhichmayexecute;itcan becomputedexactlysincerfw:emay(w)gisarecognizablerelation.thereachingwhataboutthe?instance?whenpredicatesemust(v)oreconditional(u;v)areemptyfor spuriousowdependencesmayremainwhentheresultisapproximate. Now,thereissomethingmissinginthispresentationofreachingdenitionanalysis: Noticethatalloutputandanti-dependencesareremovedbythealgorithm,butsome allpossiblereachingdenitionsvofareadinstanceu,itmeansthatanuninitializedvalue neededintheresultofreaching-definition-analysis. isapossiblereachingdenitionornot.thisgivesanautomaticwaytoinsert?when true.intermsofour\practicalproperties",okacanbeusedtodeterminewhether? maybereadbyu,hencethat?isapossiblereachingdenition;andthereciprocalis thelimitsofrelyingonalistofrenementrulestocomputeanapproximatereaching instancewisedependenceanalysisofrecursiveprograms,butweshouldalsorecognize Toconcludethissection,wehaveshownaverycleanandpowerfulframeworkfor
Reaching-Definition-Analysis(program) 4.4.THECASEOFTREES program:anintermediaterepresentationoftheprogram 145 3 2 1computeemay;emustandeconditionalusingstructuralandexternalanalyses returnsareachingdenitionrelationbetweenallaccesses 4foreachassignmentstatementsinprogram 5docheckforpropertiesoka,vpa,andotherproperties \(Rfw:emay(w)g) Dependence-Analysis 69dokill 78foreachpairofassignmentstatements(s;t)inprogram applyrenementrulesonaccordingly usingexternalstaticanalysesoraskingtheuser 10 12return 11 kill f(us;w)2wr:(9vt2w:usw^vtw^us<lexvt ^(emust(vt)_econditional(us;vt)_econditional(w;vt)))g denitionrelationfromanapproximatedependencerelation.nowthatthefeasibility ofinstancewisereachingdenitionanalysisforrecursiveprogramshasbeenproven,itis timetoworkonaformalframeworktocomputepredicatesemay,emustandeconditional, fromwhichwecouldexpectapowerfulreachingdenitionanalysisalgorithm. 4.4 Wewillnowprecisethedependenceandreachingdenitionanalysisinthecaseofatree structure.practicalcomputationswillbeperformedonprogrambstpresentedin4.2. TheCaseofTrees freemonoids.computationoffunctionfforprogrambsthasalreadybeendoneinsection4.2.5.figure4.7showsarationaltransducerrealizingrationalfunctionf.following freemonoidmdata=fl;rgandthestoragemappingisarationaltransductionbetween storagemapping.whentheunderlyingdatastructureisatree,itsabstractionisthe TherstpartoftheDependence-Analysisalgorithmconsistsincomputingthe distinguishbetweendistinctreferencesini2,j2,bande,yieldingnewlabelsi2p,i2p->l, J2p,J2p->r,bp,bp->l,epandep->r(thesenewlabelsmayonlyappearasthelastletterina thelinesofsection2.3.1page68,thealphabetofstatementlabelshasbeenextendedto transduction.theresultforprogrambstisgivenbythetransducerinfigure4.8. controlword). ComputationofisdonethankstoElgotandMezei'stheorem,andyieldsarational \<lexcanbecomputedexactly(afterremovingconictbetweenreadsin).itis thecaseforprogrambst,andtheexactdependenceanalysisresultisshowninfigure4.9.inthegeneralcase,aconservativeleft-synchronousapproximationofmustbe computed,seesection3.7. Analysisalgorithmdoesnotrequireanyapproximation:dependencerelation= Whenisrealizedbyaleft-synchronoustransducer,thelastpartoftheDependenceinstancesofthesameblockI1orJ1.WewillshowinSection5.5thatthisresultcanbe notholdanyrecursivecall i.e.lorr.thatmeansthatalldependencesliebetween ducerisoftheformu=wu0andv=wv0wherew2ff;p;l;r;i1;j1gandu0;v0do Onemayimmediatelynoticethateverypair(u;v)acceptedbythedependencetrans- usedtoruntherstifblock statementi1 inparallelwiththesecond statementj1. Eventually,itappearsthatdependencetransductionisarationalfunction,andthe
146... CHAPTER4.INSTANCEWISEANALYSISFORRECURSIVEPROGRAMS LPjl P I1 I1j" FPj" J1j" RPjr I2pI2pj" J1 I2I2j" I2p->ljlI2p->l J2pJ2pj" J2j" J2 J2p->rjr aj" J2p->r a bpj" bp->ljlcjl dj" bp bp->l c d epj" ep->rjrfjr...figure4.7.rationaltransducerforstoragemappingfofprogrambst... ep ep->r f... LPjLP 1 2 I1jI1 FPjFPJ1jJ1 RPjRP 8 3I2bpjI2p I2pjI2bp 4I2jI2I2cjI2p->l I2p->ljI2c J2pjJ2ep 5 9J2epjJ2pJ2jJ2 J2p->rjJ2f 10 J2fjJ2p->r bpja ajbp 11 6 cjbp->l bp->ljc 7 epjd djep 12 fjep->r ep->rjf...figure4.8.rationaltransducerforconictrelationofprogrambst... 13 theonlydependencesonprogrambstareanti-dependences. restrictionoftopairs(u;v)ofareaduandawritevyieldstheemptyrelation!indeed,
4.5.THECASEOFARRAYS... 147 LPjLP 1 2 I1jI1 FPjFPJ1jJ1 RPjRP 8 3I2pjI2bp 4I2jI2I2p->ljI2c5 9J2pjJ2epJ2jJ2 J2p->rjJ2f ajbp bp->ljc 10 11 6 7 djep ep->rjf...figure4.9.rationaltransducerfordependencerelationofprogrambst... 12 13 4.5 arraystructure.practicalcomputationswillbeperformedonprogramqueenspresented Wewillnowprecisethedependenceandreachingdenitionanalysisinthecaseofan TheCaseofArrays in4.1.... PFPj0 aaj0b A IAAj0 rj0bbj0jj0 QPj0 P0FPj0 bbj1 r J aaj0 A0IAAj0 Jj0 QPj1 J0 s0 sj0 storagemapping.whentheunderlyingdatastructureisanarray,itsabstractionisthe...figure4.10.rationaltransducerforstoragemappingfofprogramqueens... freecommutativemonoidmdata=z.computationoffunctionfforprogramqueens hasalreadybeendoneinsection4.2.5.figure4.10showsarationaltransducerrealizing TherstpartoftheDependence-Analysisalgorithmconsistsincomputingthe
148 rationalfunctionf:ctrl!z.itreectsthecombinationofregularexpressions(4.5) CHAPTER4.INSTANCEWISEANALYSISFORRECURSIVEPROGRAMS and(4.6). tion.theresultforprogramqueensisgivenbythetransducerinfigure4.11 withfour initialstates. ComputationofisdonethankstoTheorem3.27,andyieldsaone-countertransduc- theunderlyingrationaltransducerofisrecognizable,henceleft-synchronous(fromtheorem3.12)andcanthusberesynchronizedwiththeconstructiveproofoftheorem3.19 Tocomputeadependencerelation,onerstrestricttopairsofaccesseswithatleast onewrite,thenintersecttheresultwiththelexicographicorder.fromproposition3.13 togetaone-countertransducerwhoseunderlyingrationaltransducerisleft-synchronous. limitedtoconictsoftheform(us;vr),us;vr2lctrl.thelackingthreefourthsofthe andnotusedforreachingdenitionanalysis.theunderlyingrationaltransducerisonly transducerhavenotbeenrepresentedbecausetheyareverysimilarthetherstfourth ResynchronizationofhasbeenappliedtoprogramQueensinFigure4.12:itis pseudo-left-synchronousbecauseresynchronizationhasnotbeenappliedcompletely,see Section3.6andDenition3.28 rationaltransducerisleft-synchronous. canbecomputedexactlyandisrealizedbyaone-countertransducerwhoseunderlying Intersectionwith<lexisdonewithTheorem3.14.Asaresult,thedependencerelation transduceristrimmed).thistime,onlyonethirdoftheactualtransducerisshown:the transitionsjjaandsjqarekeptbuttransitionsajj,ajbandjjbareremoved(andthe synchronoustransducerinfigure4.12.knowingthatb<txtj<txtaands<txtq, ThisisappliedtoprogramQueensinFigure4.13,startingfromthepseudo-left- similartransducers,andarenotusedforreachingdenitionanalysis. transducerrealizingowdependences.antiandoutputdependencesarerealizedbyvery BecauseusisanancestorofuQvr,propertyokaissatised.Dependencetransducerin Queens.Asimpleanalysisoftheinnerloopshowsthatjisalwayslessthank.This provesthatforanyinstancewofr,thereexistsu;v2ctrls.t.w=uqvrandusuqvr. WenowdemonstratetheReaching-Definition-Analysisalgorithmonprogram Figure4.13showsthatallinstancesofsexecutingafterusareoftheformuQv0s,andit Wegettheone-countertransducerinFigure4.14.Noticethatthe?instance(associated holds.applyingrulevpa,wecanremovetransitionjjaawhichdoesnotyieldancestors. afterusmaybeindependencewithw.incombinationwithoka,propertyvpathus alsoshowsthatreadingqincreasesthecounter:theresultisthatnoinstanceexecuting propertyokaensuresthatatleastanancestorofeveryreadinstancedenedavalue. withuninitializedvalues)isnotacceptedasapossiblereachingdenition:thisisbecause is\probably"undecidable.asaresult,weachieved inasemi-automatedway thebest provethatthisresultisexact:auniquereachingdenitioniscomputedforeveryread instance.however,thegeneralproblemofthefunctionalityofanalgebraictransduction Thetransduceris\compressed"inFigure4.15toincreasereadability.Itiseasyto precisionpossible.thispreciseresultwillbeusedinsection5.5toparallelizeprogram 4.6 Queens. nestedlistandarraystructure.practicalcomputationswillbeperformedonprogram Wewillnowprecisethedependenceandreachingdenitionanalysisinthecaseofa TheCaseofCompositeDataStructures Countpresentedin4.3.
4.6.THECASEOFCOMPOSITEDATASTRUCTURES... 149 12 FPj";!0 67 BBj" IAAj" QPj" bbj";+1 34 rj" Jj" 5aAj" "jfp; 1 "jaa "jiaa "jj "jqp; 1 8 9"js=0 10 12 11 15 "jbb "jiaa "jqp FPj";!0 "jbb; 1 13 14 16 "jr;=0 "jj "jaa"jfpaaj" IAAj" Jj" QPj";+1 17 18 sj" 19FPj";!0 BBj";+1 21 20IAAj" 22rj" Jj" 23aAj" QPj" 29FPj";!0 bbj";+1 aaj" 30IAAj" Jj" QPj";+1 "jfp 24 31 "jbb 26 25"jIAA "jqp "jbb; 1 27 "jr "jj28"jaa 34 "jfp; 1 32 33 sj" "jaa"jiaa 35"jJ "jqp; 1 36 37 38 "js...figure4.11.one-countertransducerforconictrelationofprogramqueens... storagemapping.whentheunderlyingdatastructureisbuiltofnestedtreesandarrays, itsabstractionisafreepartiallycommutativemonoidmdata.computationoffunctionf forprogramcounthasalreadybeendoneinsection4.2.5. TherstpartoftheDependence-Analysisalgorithmconsistsincomputingthe
150... CHAPTER4.INSTANCEWISEANALYSISFORRECURSIVEPROGRAMS "jbb; 1 24 "jbb "jaa "jr "jiaa13 "jjjjaa 5 JjBB "jbb; 1 aajj "jqp IAAj" 68 QPj";+1 "j" 7 11 13 "jbb"jaa aajbb "jr "jiaa10 12"jQP "jj14 Jj" 9sj";=0 aajaa 21 "j" IAAj" 15 16 aaj" 22 JjJQPjQP;+1 IAAjIAA 17QPj";+1sj";=0 24 23 19 20FPjFP!0 aaj" Jj" 18 "jr;=026 "jiaa sjqp "jbb 25 "jqp "jbb; 1 "jaa "jj 27... Figure4.12.Pseudo-left-synchronoustransducerfortherestrictionoftoWR tion.onprogramcount,therearenowriteaccessestotheinodestructure.now,we couldbeinterestedinananalysisofconict-missesforcacheoptimization[td95].the resultf 1fforprogramCountisthusinteresting,anditistheidentityrelation!This ComputationofisdonethankstoTheorem3.28,andyieldsaone-countertransduc- left-synchronousone.eventually,thereaching-definition-analysisalgorithmhas apply:itisnecessaryingeneraltoapproximatetheunderlyingrationaltransducerbya provesthatthesamememorylocationisneveraccessedtwiceduringprogramexecution. notechnicalissuesspecictonestedtreesandarrays. Now,whencomputingadependencerelationingeneral,Proposition3.13doesnot 4.7 Beforeevaluatingouranalysisforrecursiveprograms,wesummaryitsprogrammodel restrictions.firstofall,somerestrictionsarerequiredtosimplifyalgorithmsandshould ComparisonwithOtherAnalyses fordetails: beconsideredharmlessthankstopreviouscodetransformations seesections2.2and4.2 nofunctionpointers(i.e.higher-ordercontrolstructures)andnogotosareallowed;
4.7.COMPARISONWITHOTHERANALYSES... 151 "jbb; 1 24 "jbb "jaa "jr "jiaa13 "jjjjaa "jq5 IAAj" 68 QPj";+1 "j" 7 Jj" 9sj";=0aAjaA 12 aaj" 13 JjJQPjQP;+1IAAjIAA 10 11 "jr;=015 14 FPjFP 17 "jiaa sjqp "jbb 16 "jqp "jj 18!0 Figure4.13.One-countertransducerfortherestrictionofdependencerelationtoow "jbb; 1 "jaa... dependences expressionsinright-handsidemayholdconditionalsbutnofunctioncallsandno aloopvariableisinitializedattheloopentryandusedonlyinsidethisloop; everydatastructuresubjecttodependenceorreachingdenitionanalysismustbe loops; programtransformations,butshouldberemovedinfurtherversionsoftheanalysis,thanks Now,somerestrictionsontheprogrammodelcannotbeenavoidedwithpreliminary declaredglobal; toappropriateapproximationtechniques(inductionvariablesaredenedinsection4.2): inductionvariablesmustfollowverystrongrulesregardinginitializationandupdate; onlyscalars,arrays,treesandnestedtreesandarraysareallowedasdatastructures; everyarraysubscriptmustbeananefunctionofintegerinductionvariablesand everytreeaccessmustdereferenceapointerinductionvariableoraconstant. symbolicconstants;
152... CHAPTER4.INSTANCEWISEANALYSISFORRECURSIVEPROGRAMS aajaa 3 4 JjJQPjQP;+1IAAjIAA 12 "jr;=06 5 FPjFP 8 "jiaa sjqp "jbb 7 "jqp "jj 9!0 Figure4.14.One-countertransducerforreachingdenitionrelationofprogramQueens "jbb; 1 "jaa... 1 JQPIAAjJQPIAA;+1!0 FPIAAjFPIAAaAjaA 2 JsjJQPIAA "jjqpiaa 3 "jbb; 1 "jaa "jbb 4 "jr;=0 5...Figure4.15.Simpliedone-countertransducerfor... structures,andweexpectnogeneralwaytoavoidit: randominsertionsanddeletionsintreesareforbidden(allowedonlyattrees'leaves). Eventually,onerestrictionisverydeeplyrootedinthemonoidabstractionfortree theexistinginstancewiseanalysesforloopnests. staticanalyses someofwhichalsohandleourfullprogrammodel andwiththoseof Staticdependenceandreachingdenitionanalysesgenerallycomputethesamekindof Wearenowabletocomparetheresultsofouranalysistechniquewiththoseofclassical in[rr99].comparisonoftheresultsisrathereasy:noneofthesestaticanalysesisinstancewise.4noneofthesestaticanalysesisabletotellwhichinstanceofwhichstatemensivestudyofstaticanalysisusefultoparallelizationofrecursiveprogramscanbefound orotherdata-owanalysistechniques[lrz93,be95,hhn94,ksv96].acomprehen- results,whethertheyarebasedonabstractinterpretation[cou81,jm82,har89,deu94] interpretationframeworkisindeedpossible,butveryfewworkshavebeenmadeinthisdirection,see 4Wethinkthatbuildinganinstancewiseanalysisofpracticalinterestinthedata-oworabstract
4.7.COMPARISONWITHOTHERANALYSES isinconict,independence,orapossiblereachingdenition.however,theseanalyses 153 areveryusefultoremoveafewrestrictionsinourprogrammodel,andtheyalsocompute propertiesusefultoinstancewisereachingdenitionanalysis.rememberthatourown instancewisereachingdenitionanalysistechniquemakesaheavyuseofsocalled\external"analyses,whichpreciselyareclassicalstaticanalyses.ashortcomparisonbetween beproposedinsection5.5,alongwithsomepracticalexamples. parallelizationfromtheresultsofouranalysisandparallelizationfromstaticanalyseswill methodtogetafairevaluationconsistsinrunningbothanalysesontheircommonprogram niquewasclearlyintendedtoextendsuchanalysestorecursiveprograms.asimple modelsubset.thegeneralresultisnotsurprising:today'smostpowerfulreachingdenitionanalysesforloopnestssuchasfuzzyarraydataowanalysis(fada)[bcf97,bar98] Comparisonwithinstancewiseanalysesforloopnestsismoretopical,sinceourtech- ouranalysisforrecursiveprograms.therearemanyreasonsforthat: andconstraint-basedarraydependenceanalysis[wp95,won95]arefarmoreprecisethan wedonotuseconditionalsandloopboundstoestablishourresults,orwhenitis multi-dimensionalarraysareroughlyapproximatedbyone-dimensionalones; thecase,itisthrough\external"staticanalyses; somecriticalalgebraicoperationssuchasintersectionandcomplementationarenot rationalandalgebraictransducershavealimitedexpressivepowerwhendealing withintegerparameters(onlyonecountercanbedescribed); rootedthephilosophyofeachtechnique. AmajordierencebetweenFADAandouranalysisforrecursiveprogramisdeeply decidableandthusrequirefurtherapproximations. FADAisafullyexactprocesswithsymboliccomputationsand\dummy"parameters computationprocess(seesection2.4.3). attheend;thisensuresthatnopreciousdata-owinformationislostduringthe associatedwithunpredictableconstraints,andonlyoneapproximationisperformed Ourtechniqueisnotasclever,sincemanyapproximationstagescanbeinvolved. Itismoresimilartoiterativemethodsinthatsense,andhenceitisfarfrombeing Butthecomparisonalsorevealsverypositiveaspects,intermsofalltheinformation haveenoughexpressivepowertoavoidit. optimal:someapproximationsaremadeevenifthemathematicalabstractioncould availableintheresultofouranalysis: exactnessoftheresultisequivalenttodecidingthefunctionalityofatransduction, ones,anddecidabilityofthenitenessofasetofreachingdenitionscanhelpin andisthuspolynomialforrationaltransductions;butitisunknownforalgebraic emptinessofasetofreachingdenitionsisdecidable,whichallowsautomaticdetectionofreadaccessestouninitializedvariables; somecases; [DGS93,Tzo97,CK98].
154inthecaseofrationaltransductions,dependencetestingcanbeextendedtorational CHAPTER4.INSTANCEWISEANALYSISFORRECURSIVEPROGRAMS inthecaseofalgebraictransductions,dependencetestingisequivalenttotheintersectionofanalgebraiclanguageandarationalone,becauseofnivat'stheorem3.21 foralgebraictransductionsandevey'stheorem3.24;thisisstillveryusefulforpar- languagesareclosedunderintersection;thisisveryusefulforparallelization; languagesofcontrolwords,becauseofnivat'stheorem3.6andthefactthatrational analysisandloopnestanalysestoparallelization. WerefertoSection5.5foradditionalcomparisonsbetweentheapplicabilityofour allelization. Wepresentedanapplicationofformallanguagetheorytotheautomaticdiscoveryof 4.8 somesemanticpropertiesofprograms:instancewisedependencesandreachingdenitions.whenprogramsarerecursiveandnothingisknownaboutrecursionguards,only Conclusion rays)transduction.theresultofthereachingdenitionanalysisisatransducermapping betweenreadsandtheirreachingdenitionsbyarational(fortrees)oralgebraic(forar- controlwordsofreadinstancestocontrolwordsofwriteinstances.twoalgorithmsfor conservativeapproximationscanbehopedfor.inourcase,weapproximatetherelation dependenceandreachingdenitionanalysisofrecursiveprogramsweredesigned.incidentally,theseresultsshowedtheuseofthenewclassofleft-synchronoustransductions overfreemonoids. Weshouldthusworkonagracefuldegradationofouranalysestoencompassalargerset somestrongrestrictionsontheprogrammodellimitthepracticaluseofourtechnique. proximationsandsometimesevenexactresults.someproblemsobviouslyremain.first, Wehaveappliedourtechniquesonseveralpracticalexamples,showingexcellentap- reachingdenitionanalysisisnotquitematurenow,sinceitreliesonratherad-hoctechniqueswhosegeneralapplicabilityisunknown.moretheoreticalstudiesareneededto perhapsberemovedbyallowingcomputationofapproximatestoragemappings.second, ofrecursiveprograms:forexample,restrictionsoninductionvariablesoperationscould rationalandalgebraictransducers. decidewhetherpreciseinstancewisereachingdenitioninformationcanbecapturedby parallelizationofrecursiveprograms.theseapplicationsincludearrayexpansionand braictransductionsallowseveralapplicationsofourframework,especiallyinautomatic parallelismextraction. Wewillshowinthenextchaptersthatdecidabilitypropertiesonrationalandalge-
155 Chapter5 ParallelizationviaMemory Expansion Thedesignofprogramtransformationsdedicatedtodependenceremovalisawellstudied topic,asfarasnestedloopsareconcerned.techniquessuchasconversiontosingleassignmentform[fea91,gc95,col98],privatization[mal93,tp93,cre96,li92],and manyoptimizationsforecientmemorymanagement[lf98,cfh95,cdrv97,qr99] However,theseworkshavemostlytargetedaneloopnestsandfewtechniqueshave havebeenprovenusefulforpracticalparallelizationofprograms(automaticallyornot). single-assignment)[cfr+91]frameworkbecomeobvious. issuesarisewhentryingtoexpanddatastructuresinunrestrictednestsofloops,and becauseofthenecessarydata-owrestoration,conuentinterestswiththessa(static beenextendedtodynamiccontrolowandgeneralarraysubscripts.veryinteresting transformation.novelexpansiontechniquespresentedinsections5.2,5.3and5.4are nestsofloopsandwedesignpracticalsolutionsforageneralsingle-assignmentform therstgoalofsection5.1;then,westudyspecicproblemsrelatedwithnon-ane Motivationformemoryexpansionandintroductionofthefundamentalconceptsis techniquesforaneloopnestsandthefewresultswithirregularcodes. contributionstobridgingthegapbetweentherichapplicationsofmemoryexpansion dataparallelmodelfornestedloops.applicablealgorithmshavebeenmostlydesigned nature:principlesofparallelprocessingarethenverydierentfromthewellmastered forstatementwisedependencetests,whenouranalysiscomputesanextensiveinstancewisedescriptionofthedependencerelation!thereisofcoursealargegapbetweenthe twoapproachesandweshouldnowdemonstratethatusingsuchapreciseinformation forrecursiveprograms.becausethislastsectionaddressesanewtopic,severalnegative Whenextendingtheprogrammodeltorecursiveprocedures,theproblemisofanother addressedbysection5.5,startingwithaninvestigationofmemoryexpansiontechniques bringspracticalimprovementsoverexistingparallelizationtechniques.theseissuesare 5.1 ordisappointinganswersaremixedwithsuccessfulresults. Topointoutthemostimportantissuesrelatedwithmemoryexpansion,andtomotivate thefollowingsectionsofthischapter,westartwithastudyofthewell-knownexpansion MotivationsandTradeos ofviewsarediscussed.severalresultspresentedherehavebeenalreadypresentedby techniquecalledconversiontosingle-assignmentform.bothabstractandpracticalpoint
manyauthors,withtheirformalismandtheirprogrammodel,butwepreferedtorewrite 156 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION mostofthisworkinoursyntaxtoxthenotationsandtoshowhowmemoryexpansion alsomakessenseoutoftheloopnestprogrammingmodel. Oneofthemostusualandsimplestexpansionschemesisconversiontosingle-assignment 5.1.1 (SA)form.Itistheextremecasewhereeachmemorylocationiswrittenatmostonce ConversiontoSingle-AssignmentForm duringexecution.thisisslightlydierentfromstaticsingle-assignmentform(ssa) [CFR+91,KS98],whereeachvariableiswrittenatmostinonestatementintheprogram, andexpansionislimitedtovariablerenaming. DbyanassignmenttoanewdatastructureDexpwhoseelementshavethesametypeas duringanyprogramexecution.eachelementofdexpisassociatedtoasinglewriteaccess. elementsofd,andareinone-to-onemappingwiththesetwofallpossiblewriteaccesses TheideaofconversiontoSA-formistoreplaceeveryassignmenttoadatastructure Thisaggressivetransformationensuresthatthesamememorylocationisneverwritten accordingly,andiscalledrestorationoftheowofdata.instancewisereachingdenition informationisofgreathelptoachievethis:foragivenprogramexecutione2e,thevalue twiceintheexpandedprogram.thesecondstepistotransformthereadreferences Ingeneral,anexactknowledgeofeforeachexecutioneisnotavailableatcompiletime: elementofdexpassociatedwithe(h{;refi)(seesection2.4fornotationsanddenitions). theresultofinstancewisereachingdenitionanalysisisanapproximaterelation.the readbysomeaccessh{;refitodinright-handsideofastatementispreciselystoredinthe anon-singletonset:theideaisthentogeneratearun-timedata-owrestorationcode, compile-timedata-owrestorationschemeaboveisthusunapplicablewhen(h{;refi)is whoseargumentistheset(h{;refi)ofpossiblereachingdenitions. whichtrackswhatisthelastinstanceexecutedin(h{;refi).aswehaveseenforgeneral expansionschemesinsection1.2,thisrun-timerestorationcodeishiddeninafunction CurInsholdstherun-timeinstancevalue,encodedasacontrolwordoriteration Afewnotationsarerequiredtosimplifythesyntaxofexpandedprograms. vector,foranystatementintheprogram.itissupposedtobeupdatedon-linein hasthesyntaxofafunctionfromsetsofrun-timeinstancestountypedvalues, functioncalls,loopiterationsandeveryblockentry.moreprecisionsaboutthis butitssemanticsistosummarizeapieceofdata-owrestorationcode.itisvery topicinsection5.1.3andsection5.5.3. DexpistheexpandeddatastructureassociatedwithsomeoriginaldatastructureD. functionsisthepurposeofsection5.1.2. similartofunctionsinthessaframework[cfr+91,ks98].codegenerationfor Its\abstract"syntaxisinheritedfromarrays:Dexp[setofelementnames]for elementnamesareeitherintegervectorsorwords,anddexpisanarray,atree,ora thedeclarationanddexp[elementname]forthereadorwriteaccess.inpractice, WenowpresentAbstract-SA:averygeneralalgorithmtocomputethesingleassignmentform.Thisalgorithmisneitherreallynewnorreallypractical,butitdenes asapointertoatreestructure.seesections5.1.3and5.5.1fordetails. nestoftreesandarrays.its\concrete"syntaxisthenimplementedasanarrayor ageneraltransformationschemeforsaprograms,independentlyofthecontrolanddata
5.1.MOTIVATIONSANDTRADEOFFS structures.ittakesasinputthesequentialprogramandtheresultofaninstancewise 157. partsofthegeneratedcodehavebeenencapsulatedinhigh-levelnotations:curinsand Thisalgorithmisvery\abstract"sincedatastructuresarenotdenedpreciselyandsome reachingdenitionanalysis seenasafunction.controlstructuresareleftunchanged. Abstract-SA(program;W;) returnsanintermediaterepresentationoftheexpandedprogram :areachingdenitionrelation,seenasafunction W:aconservativeapproximationofthesetofwriteaccesses program:anintermediaterepresentationoftheprogram 31foreachdatastructureDinprogram 2dodeclareadatastructureDexp[W] 654 doref foreachreferencereftodinprogram doleft-handsideofs foreachstatementsassigningdinprogram if((curins;ref)==f?g)ref elseif((curins;ref)=f{g)dexp[{] Dexp[CurIns] 7returnprogram Wewillshowinthefollowingthatseveral\abstract"partsofthealgorithmcanbe else((curins;ref)) implementedwhendealingwith\concrete"datastructures.generatingcodeforthe 5.1.2 functionisthepurposeofthenextsection. Whengeneratingcodeforfunctions,thecommonideaistocomputeatrun-timethe lastinstancethatmaypossiblybeareachingdenitionofsomeuse.ingeneral,foreach Run-TimeOverhead expandeddatastructuredexponeneedsanadditionalstructureinone-to-onemapping withdexp.inthestaticsingle-assignmentframeworkforarrays[ks98],theseadditional denotedbydexp. anothernotation:thedatastructureinone-to-onemappingwithdexpisa-structures eralsingle-assignmentform,weproposeanothersemanticsforadditionalstructures,hence structuresarecalled@-structuresandstorestatementinstances.dealingwithamoregen- andtheidentityofthelastinstancewhichassignedthismemorylocation.becausewe aredealingwithsingle-assignmentprograms,theidentityofthelastinstanceisalready shouldstoretwoinformations:thememorylocationassignedintheoriginalprogram Toensurethatrun-timerestorationoftheowofdataispossible,elementsofDexp thusstorememorylocations. capturedbytheelementitself(i.e.thesubsrciptofdexp).1elementsofdexpshould DexpisinitializedtoNULLbeforetheexpandedprogram; typeand/orsemanticsof-structures. 1Thisrun-timerestorationtechniqueisthusspecictoSA-form.Otherexpansionsrequiredierent EverytimeDexpismodied,theassociatedelementofDexpissettothevalueof thememorylocationthatwouldhavebeenwrittenintheoriginalprogram.
158WhenareadaccesstoDintheoriginalprogramisexpandedintoacalloftheform CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION readintheoriginalprogram. executionorder ofall{2setsuchthatdexp[{]isequaltothememorylocation (set),thefunctionisimplementedasthemaximum accordingtothesequential Abstract-Implement-Phi(expanded) 2doiftherearefunctionsaccessingDexp 1foreachdatastructureDexpinexpanded expanded:anintermediaterepresentationoftheexpandedprogram 3returnsanintermediaterepresentationwithrun-timerestorationcode 465 thendeclareastructuredexpwiththesameshapeasdexpinitializedtonull foreachreadreferencereftodexpwhoseexpandedformis(set) doforeachstatementsinvolvedinset 987 (set) dorefs ifnotalreadydonefors thenfollowingsinsertdexp[curins]=fe(curins;refs) Dexp[max<seqf{2set:Dexp[{]=fe(CurIns;ref)g] writereferenceins functions.inthisalgorithm,thesyntaxfe(curins;ref)meansthatweareinterestedin 10returnexpanded thememorylocationaccessedbyreferenceref,andnotthatsomecompile-timeknowledge offeisrequired.ofcourse,practicaldetailsandoptimizationsdependonthecontrol Abstract-Implement-Phiistheabstractalgorithmtogeneratethecodefor thekeytorun-timeoverheadoptimization.indeed,asshownbyourcodegenerationalgorithm,sa-transformedprogramsaremoreecientwhenfunctionsaresparse.thus, elementofanew-structureiswrittenatmostonce. structures,seesection5.1.4.noticethatthegeneratedcodeisstillinsaform:each aparallelizingcompilerhasmanyreasonstoperformapreciseinstancewisereachingdefinitionanalysis:itimprovesparallelismdetection,allowtochoosebetweenalargerscope ofparallelexecutionorders(dependingonthe\grainsize"andarchitecture),andre- right-handsideofris ducesrun-timeoverhead.anexampleborrowedfromprogramsjsin[col98]ispresented infigure5.1.themostprecisereachingdenitionrelationforreferencea[i+j-1]in Animportantremarkatthispointisthatinstancewisereachingdenitionanalysisis (hr;i;j;a[i+j-1]i)= thenhs;i;j 1i ifj1 elseelsehti ifi1 thenhs;i 1;ji: relationinducesaspuriousfunction,asinfigure5.1.b.onemaynoticethatthequast implementationinfigure5.1.cisnotreallyecientandmayberathercostly;butusing Rneverreachanyuse.Expandingtheprogramwithalessprecisereachingdenition Thisexactresultshowsthatdenitionsassociatedwiththereferenceinleft-handsideof oncemoreforfurtherstudiesaboutintegratingoptimizationtechniques. [AI91] cansignicantlyreducethisoverhead,seefigure5.1.d.thisremarkadvocates classicaloptimizationssuchaslooppeeling orgeneralpolyhedronscanningtechniques
5.1.MOTIVATIONSANDTRADEOFFS... 159 TA[0]=0; doublea[n]; Sfor(i=0;i<N;i++) for(j=0;j<n;j++){ TAT=0; R } A[i+j]=; for(i=0;i<n;i++) doublea[n],at,as[n,n],ar[n,n]; Figure5.1.a.Originalprogram A[i]=A[i+j-1]; RS for(j=0;j<n;j++){ AR[i,j]=(fhTig[fhS;i0;j0i: AS[i,j]=; } (i0;j0)<lex(i;j)g) TAT=0; doublea[n],at,as[n,n],ar[n,n]; Figure5.1.b.SAwithoutreachingdenitionanalysis RSfor(i=0;i<N;i++) for(j=0;j<n;j++){ AR[i,j]=if(j==0)if(i==0)ATelseAS[i-1,j] AS[i,j]= } ; elseas[i,j-1] Figure5.1.c.SAwithprecisereachingdenitionanalysis AT=0; AR[1,1]=AT; AS[1,1]=; doublea[n],at,as[n,n],ar[n,n]; for(i=0;i<n;i++){ AS[i,1]=; AR[i,1]=AS[i-1,1]; for(j=0;j<n;j++){ }} AR[i,j]=AS[i,j-1]; AS[i,j]=; Figure5.1.d.Precisereachingdenitionanalysispluslooppeeling...Figure5.1.Interactionofreachingdenitionanalysisandrun-timeoverhead... whenitisafunction(i.e.itisexact).butthereisabigdierencebetweenthetwo overhead:computingreachingdenitionsusingatrun-timemayalsobecostly,even sourcesofoverhead:run-timecomputationofcanbecostlybecauseofthelackof Eventually,oneshouldnoticethatfunctionsarenottheonlysourceofrun-time expressivenessofcontrolstructuresandalgebraicoperationsinthelanguageorbecauseof thanquasts.ontheopposite,theoverheadoffunctionsisduetotheapproximative themathematicalabstraction.forexample,transductionsgenerallyinducemoreoverhead
knowledgeoftheowofdataanditsnon-deterministicimpactonthegeneratedcode;it 160 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION thecaseforfunctions. isthusintrinsictotheexpandedprogram,nomatterhowitisimplemented.inmany optimizationtechniques anexamplewillbepresentedlateronfigure5.1 butitisnot cases,indeed,therun-timeoverheadtocomputecanbesignicantlyreducedbyclassical scalarsandarrays.anextensiontofunctioncalls,recursiveprogramsandrecursivedata Inthissection,weonlyconsiderintra-proceduralexpansionofprogramsoperatingon 5.1.3 Single-AssignmentforLoopNests structuresisstudiedattheendofthischapter,insection5.5.theserestrictionssimplify anerelation(see[bcf97,bar98]andsection2.4.3).wepointedinsection3.1.1that theexpositionofa\concrete"saalgorithmintheclassicalloopnestframework. aquast[fea91].thisrepresentationofrelationisespeciallyinterestingforexpansion seeingananerelationasafunction,itcanbewrittenasanestedconditionalcalled Whendealingwithnestofloops,instancewisereachingdenitionsaredescribedbyan anyanerelation. purposessinceitcanbeeasilyandecientlyimplementedinaprogramminglanguage. AlgorithmMake-QuastintroducedinSection3.1.1buildsaquastrepresentationfor Stmt(hS;xi)=S(thestatement), Weusethefollowingnotations: andarray(s)isthenameoftheoriginaldatastructureassignedbystatements. Iter(hS;xi)=x(theiterationvector), Givenaquastrepresentationofreachingdenitions,Convert-Quastgeneratesanef- cientcodetoretrievethevaluereadbysomereference.thiscodeismoreorlessa compile-timeimplementationoftheconditionalgeneratedattheendofabstract-sa. Afunctionisgeneratedwhenanon-singletonsetisencountered.Eventually,because useanarrayas[x]insteadoftheproposedaexphs;xiintheabstractsaalgorithm. statementspartitionthesetofmemorylocationsinthesingle-assignmentprogram,we scalarsareseenareone-dimensionalarraysofasingleelement.allmemoryaccessesare iterationvector(builtfromthesurroundingloopvariables).tosimplifytheexposition, ThenewalgorithmisLoop-Nests-SA.CurrentinstanceCurInsisimplementedbyits ThankstoConvert-Quast,wearereadytospecializeAbstract-SAforloopnests. thusperformedthrougharraysubscripts. stancesarestoredinadistinctstructureforeachstatement:weuseas[x]insteadof Aexp[hS;xi].ThenewalgorithmisLoop-Nests-Implement-Phi.Ecientcomputationofthelexicographicmaximumcanbedonethankstoparallelreductiontechniques [RF94]. dealingwithloopnestsandarraysonly.forthesamereasonasbefore,run-timein- Theabstractcodegenerationalgorithmforfunctionscanalsobeprecisedwhen someexpandedarraysasdynamicarrayswhosesizeisupdatedatrun-time.another loopboundsarenoteasilypredictableatcompile-time.onemaythushavetoconsider regardingarraydeclarationistogetacompile-timeevaluationofitssize.inmanycases, Onepartofthecodeisstillunimplemented:thearraydeclaration.Themainproblem technique suchastheonepresentedinsection5.3 tosingle-assignmentform,andto solutionproposedbycollard[col94b,col95b]istopreferastoragemappingoptimization
5.1.MOTIVATIONSANDTRADEOFFS Convert-Quast(quast;ref) quast:thequastrepresentationofthereachingdenitionfunction 161 ref:theoriginalreference,usedwhen?isencountered 31switch 2returnstheimplementationofquastasavalueretrievalcodeforreferenceref casequast=f?g: 654 casequast=f{g: A returnref 78 S x returnas[x] Iter({) Stmt({) Array({) 10 12 11 9 casequast=f{1;{2;:::g: casequast=ifpredicatethenquast1elsequast2: return(f{1;{2;:::g) Loop-Nests-SA(program;) returnifpredicateconvert-quast(quast1;ref) program:anintermediaterepresentationoftheprogram elseconvert-quast(quast2;ref) 2doforeachstatementSassigningAinprogram 1foreacharrayAinprogram returnsanintermediaterepresentationoftheexpandedprogram :areachingdenitionrelation,seenasafunction 43 dodeclareanarrayas 765 do=ref foreachreadreferencereftoainprogram quast left-handsideofsisreplacedbyas[iter(curins)] \(Iref) 10returnprogram 89 map ref map(curins) Convert-Quast(quast;ref) Make-Quast(=ref) code.twoexamplesofcodegenerationforfunctionsareproposedinthenextsection. impairsparallelization.suchstructuresareveryusualinhigh-levellanguages,butmay resultinpoorperformancewhenthecompilerisunabletoremovetherun-timeverication foldtheunboundedarrayintoaboundedonewhentheassociatedmemoryreusedoesnot 5.1.4 functions;andthiscostiscriticalfornon-scalardatastructuresdistributedacross Mostoftherun-timeoverheadcomesfromdynamicrestorationofthedataow,using OptimizationoftheRun-TimeOverhead processors.thetechniquepresentedinsection5.2(maximalstaticexpansion)eradicates suchrun-timecomputations,tothecostofsomelossinparallelismextraction.indeed, ofthissection. functionsmaysometimesbeanecessaryconditionforparallelization.thisjustiesthe designofoptimizationtechniquesforfunctioncomputation,whichisthesecondpurpose Therstmethodgroupsseveralbasicoptimizationsforloopnests,thesecondoneisbased Wenowpresentthreeoptimizationstothecode-generationalgorithminSection5.1.2.
Loop-Nests-Implement-Phi(expanded) 162expanded:anintermediaterepresentationoftheexpandedprogram CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION 1foreacharrayASinexpanded 2dodA 3returnsanintermediaterepresentationwithrun-timerestorationcode 4 refs dimensionofarrayas 65 iftherearefunctionsaccessingas thendeclareanarrayofda-dimensionalvectorsas writereferenceins 7 initializeastonull 10 89 foreachreadaccesstoasoftheform(set)inexpanded doifnotalreadydonefors theninsert 13doforeachreadaccess(set)associatedwithAinexpanded 12foreachoriginalarrayAinexpanded 11 immediatelyafters AS[Iter(CurIns)]=fe(CurIns;refS) 14 do(set) parallelfor(eachsinstmt(set)) 15returnexpandedAStmt(instance)[Iter(instance)] instance=max<seqfhs;vector[s]i:s2stmt(set)g vector[s]=max<lexfx:hs;xi2set^as[x]=fe(curins;ref)g onanewinstancewiseanalysis,andthelastoneavoidredundantcomputationsduring thepropagationof\live"denitions.thesecondandthirdmethodsapplytoloopnests FirstMethod:BasicOptimizationsforLoopNests andrecursiveprogramsaswell. (seeloop-nests-implement-phi).becauseofthehierarchicalstructureofloopnests, Whendealingwithnestsofloops,the-structuresare-arraysindexedbyiterationvectors allowstheremovaloftheassociateddimensionsin-arraysandtoreducethecomplexity motiontechniquesforinvariantassignments.anexampleof-arraysimplicationand oflexicographicmaximumcomputations.anotherconsequenceistheapplicabilityofup- accessesinaset(u)areverylikelytoshareafewiterationvectorcomponents.this ofiterationvectors,andwherethemaximumofanemptysetisthevector( 1;:::; 1). up-motionisdescribedinfigure5.2,wherefunctionmaxcomputesthemaximumofaset tosingle-assignmentformoftenrequiresa-functionbutthelastdeningwritecanbe Whenaloopassignsthesamememorylocationanunboundednumberoftimes,conversion terminationconditioniscomplex:non-anebounds,breakstatementsorexceptions. Anotherinterestingoptimizationisonlyapplicabletowhileloopsandforloopswhose loopblock. theloopcounter.2anexampleisdescribedinfigure5.3. computedwithoutusing-arrays:itsiterationvectorisassociatedwiththelastvalueof 2Thesemanticsoftheresultingcodeiscorrect,butratherdirty:aloopvariableisusedoutsideofthe
5.1.MOTIVATIONSANDTRADEOFFS... doublex; 163 for(i=1;i<=n;i++){ S for(j=1;j<=n;j++) if() for(k=1;k<=n;k++) for(i=1;i<=n;i++){ doublex,xs[n+1,n+1,n+1]; R} =x; x=; S for(j=1;j<=n;j++) if() for(k=1;k<=n;k++) xs[i,j,k]=; Figure5.2.a.Originalprogram Figure5.2.b.SAprogram R} =(fhs;i;j0;ni:1j0ng[f?g); for(i=1;i<=n;i++){ doublex,xs[n+1,n+1,n+1],xs[n+1,n+1,n+1]={null}; S for(j=1;j<=n;j++) if() for(k=1;k<=n;k++){ R ={ } xs[i,j,k]=; maxs=maxf(i;j0;k0):1j0n^k0=n^xs[i;j0;k0]=&xg; xs[i,j,k]=&x; }} if(maxs!=( 1; 1; 1))xS[maxS]elsex; Figure5.2.c.Standardimplementation for(i=1;i<=n;i++){ doublex,xs[n+1,n+1,n+1],xs[n+1]={null}; S for(j=1;j<=n;j++){ if(){ for(k=1;k<=n;k++){ R ={ } xs[j]=&x; xs[i,j,k]=; }} maxs=maxfj0:1j0n^xs[j0]=&xg; if(maxs!= 1)xS[maxS]elsex; Figure5.2.d.Optimizedimplementation SecondMethod:ImprovingtheSingle-AssignmentFormAlgorithm...Figure5.2.Basicoptimizationsofthegeneratedcodeforfunctions... denitions.whenthereadstatementistoocomplextobeanalyzedatcompile-time, Insomecases,functionscanbecomputedwithout-arraystostorepossiblereaching
164 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION... doublex; while() S x=; R=x; Figure5.3.a.Originalprogram doublex,xs[]; w=1; while(){ S xs[w]=; w++; } R=(fhS;wi:1wg[f?g); Figure5.3.b.SAprogram doublex,xs[],xs[]={null}; w=1; while(){ S xs[w]=; xs[w]=&x; w++; } R={ maxs=maxfw:xs[w]=&xg; if(maxs!= 1)xS[maxS]elsex; } Figure5.3.c.Standardimplementation doublex,xs[]; w=1; while(){ S xs[w]=; w++; } R=if(w>1)xS[w-1]elsex; Figure5.3.d.Optimizedimplementation...Figure5.3.Repeatedassignmentstothesamememorylocation... thesetofpossiblereachingdenitionscanbeverylarge.however,ifwecouldcompute theverymemorylocationaccessedbythereadstatement,thesetofpossiblereaching denitionswouldbemuchsmaller sometimesreducedtoasingleton.thisshowsthe needforanadditionalinstancewiseinformation,calledreachingdenitionofamemory location:theexactfunctionwhichdependsonanexecutione2eoftheprogramis denotedbyml eanditsconservativeapproximationbyml.herearetheformaldenitions: 8e2E;8u2Re;c2fe(We):ml e(u;c)=max <seqv2we:v<sequ^fe(v)=c ; 8e2E;8u2Re;c2fe(We):v=ml e(u;c)=)v2ml(u;c): Computingrelationmlisnotreallydierentfromreachingdenitionanalysis.To computethemlforareferencerinright-handsideofastatement,risreplacedbya readaccesstoanewsymbolicmemorylocationc,thenclassicalinstancewisereaching denitionanalysisisperformed.theresultisareachingdenitionrelationparameterized byc.seeingcasanargument,ityieldstheexpectedapproximaterelationml.insome rarecases,thiscomputationschemeyieldsunnecessarycomplexresults:3thegeneral solutionisthentointersecttheresultwith. AlgorithmAbstract-ML-SAisanimprovedsingle-assignmentformconversionalgorithmbasedonreachingdenitionsofmemorylocations.Itisbasedontheexact 3ConsideranarrayA,anassignmenttoA[foo]andareadreferencetoA[foo],wherefooissome complexsubscript.aprecisereachingdenitionanalysiswouldcomputeanexactresultbecausethe subscriptisthesameinthetwostatements.however,thereachingdenitionofagivenmemorylocation isnotknownprecisely,becausefoointheassignmentstatementisnotknownatcompiletime.
5.1.MOTIVATIONSANDTRADEOFFS run-timecomputationofthesymbolicmemorylocationwithstoragemappingfe.this 165 referencecode possiblycomplex tobesubstitutedtothesymbolicmemorylocationc. Inbothcases,thevalueoffeshouldnotbeinterpreted,itmustbeusedastheoriginal bythecurrentinstanceandthesymbolicmemorylocation,seeloop-nests-ml-sa. algorithmcanalsobeenspecializedforloopnestsandarrays,usingquastsparameterized AnexampleisdescribedinFigure5.4.... Sfor(i=1;i<=N;i++) doublea[n+1]; Figure5.4.a.Originalprogram for(j=1;j<=n;j++) A[j]=A[j]+A[foo]; SdoubleA[N+1],AS[N+1,N+1]; for(i=1;i<=n;i++) for(j=1;j<=n;j++) AS[j]=if(i>1)AS[i-1,j]elseA[j] Figure5.4.b.SAprogram +if(i>1 j>1)(f?g[fhs;i0;j0i:1i0;j0n^(i0;j0)<lex(i;j)g) elsea[foo]; SdoubleA[N+1],AS[N+1,N+1]; for(i=1;i<=n;i++) for(j=1;j<=n;j++) AS[j]=if(i>1)AS[i-1,j]elseA[j] Figure5.4.c.SAprogramwithreachingdenitionsofmemorylocations +if(foo<j)as[i,foo] elseif(i>1)as[i-1,foo]elsea[foo]; ThirdMethod:CheatingwithSingle-Assignment...Figure5.4.ImprovingtheSAalgorithm... Ageneralproblemwithimplementationsoffunctionsbasedon-structuresisthelarge redundancyoflexicographicmaximumcomputations.indeed,eachtimeafunction isencountered,themaximumofthefullsetofpossiblereachingdenitionsmustbe recomputethemaximumofthesameset.thesetechniquesarewellsuitedtothevariable computed.inthestaticsingle-assignmentframework(ssa)[cfr+91,ks98],alarge renaminginvolvedinssa,butareunabletosupportthedatastructurereconstruction partoftheworkisdevotedtooptimizedplacementoffunctions,inordertonever performedbysaalgorithms.nevertheless,foranotherexpansionschemepresentedin Section5.4.7,weareabletoavoidredundanciesandtooptimizetheplacementof functions,butthealgorithmisrathercomplex. removesredundantcomputations,butcomputationisnotmadewith-structuresinsa ThemethodweproposeherehasbeenstudiedwiththehelpofLaurentVibert.It
Abstract-ML-SA(program;W;ml) 166program:anintermediaterepresentationoftheprogram CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION 1foreachdatastructureDinprogram returnsanintermediaterepresentationoftheexpandedprogram ml:reachingdenitionsofmemorylocations W:aconservativeapproximationofthesetofwriteaccesses 2dodeclareadatastructureDexp[W] 6543 doref foreachreferencereftodinprogram doleft-handsideofs foreachstatementsassigningdinprogram elseif(ml((curins;ref);fe(curins;ref))==f{g)dexp[{] if(ml((curins;ref);fe(curins;ref))=f?g)ref Dexp[CurIns] Loop-Nests-ML-SA(program;ml) 7returnprogram program:anintermediaterepresentationoftheprogram else(ml((curins;ref);fe(curins;ref))) ml:reachingdenitionsofmemorylocations 32doforeachstatementSassigningAinprogram 1foreacharrayAinprogram returnsanintermediaterepresentationoftheexpandedprogram 654 dodeclareanarrayas 7 doml foreachreferencereftoainprogram u left-handsideofs =ref symbolicaccessassociatedwithreferenceref ml\(iref) AS[Iter(CurIns)] 10 89 quast map Convert-Quast(quast;ref) Make-Quast(ml 11returnprogram ref map(curins) =ref(u;fe(u))) form:itisbasedon@-structureswhosesemanticsissimilarto@-arraysinthestatic to-onemappingwiththeoriginaldatastructuresinsteadoftheexpandedones.notice andassociativityofthelexicographicmaximum.theideaistouse@-structuresinonependenceremovalandecientcomputationoffunctions,basedonthecommutativity single-assignment(ssa)framework[ks98].thisisasimplecompromisebetweende- instancesinsteadofmemorylocations,seeabstract-implement-phi-not-sa. butthememorylocationisnowencodedinthesubscript:@-structuresarethusstoring @-structuresarenotinsingle-assignmentform,andmaximumcomputationmustbedone inacriticalsection.boththewriteinstanceandthememorylocationshouldbestored, originalprogramsemantics.spuriousanti-dependencesremain,andmustbetakeninto turestotheir@-structures:theyhavenotdisappeared!however,thankstotheproperties ofthelexicographicmaximum,outputdependencescanbeignoredwithoutviolatingthe Theoriginalmemory-baseddependencesaredisplacedfromtheoriginaldatastruc- accountforparallelizationpurposes.therstexampleinfigure5.5canbeparallelized Itissucient,forexample,toparallelizethesecondexampleinFigure5.5.Considera withthistechnique,butnotthesecond. calloftheform(set).ifthecomponentvalueofsomedimensionsisconstantforall Inthecaseofloopnestsandarrays,asimpleextensiontothetechniquecanbehelpful.
5.1.MOTIVATIONSANDTRADEOFFS Abstract-Implement-Phi-Not-SA(expanded) expanded:anintermediaterepresentationoftheexpandedprogram 167 1foreachoriginaldatastructureD[shape]inexpanded 2doiftherearefunctionsaccessingDexp 3returnsanintermediaterepresentationwithrun-timerestorationcode 4 thendeclareadatastructure@d[shape]initializedto? 765 dosub foreachreadreferencereftodwhoseexpandedformis(set) dosubs foreachstatementsinvolvedinset subscriptofreferenceref 10 98 (set) ifnotalreadydonefors thenfollowingsinsert@d[subs]=max(@d[subs],curins) if(@d[sub]!=?)dexp[@d]elsed[sub] subscriptofthewritereferencetodins iterationvectorsofinstancesinset,thenitislegaltoexpandthe@-arrayalongthese dimensions.appliedtothesecondexampleinfigure5.5,@xisreplacedby@x[i],which 11returnexpanded makestheouterloopparallel.... doublex; SR=x; for(i=1;i<=n;i++) doublex; Figure5.5.a.Firstexample if()x=; STfor(i=1;i<=N;i++){ R for(j=1;j<=n;j++) x=; if()x=x; Sdoublex,xS[N+1],@x= 1; Figure5.5.c.Secondexample } =x; parallelfor(i=1;i<=n;i++) } if(){ R=if(@x!= 1)xS[@x] @x=max(@x,i); x=; Tdoublex,xT[N+1],xS[N+1,N+1]; double@x=( 1; 1); for(i=1;i<=n;i++){ xt[i]=; for(j=1;j<=n;j++) Figure5.5.b.Firstexample: parallelexpansion elsex; S if(){ xs[i,j]=if(j>1)xs[i,j-1] R =if(@x!=( 1; 1))xS[@x] }@x=max(@x,(i,j)); elsext[i]; Figure5.5.d.Secondexample: } elsext[i];...figure5.5.parallelismextractionversusrun-timeoverhead... notparallelizableexpansion Inpractice,thistechniqueisbothveryeasytoimplementandveryecientforrun-
timerestorationofthedataow,butitcanoftenhamperparallelismextraction.itisa 168 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION rstandsimpleattempttondatradeobetweenparallelismandoverhead. restorationofthedataowsharethesamemajordrawback:run-timeoverhead.by 5.1.5 Allthesingle-assignmentformalgorithmsdescribedandmosttechniquesforrun-time TradeobetweenParallelismandOverhead Moreover,somefunctionscannotbeimplementedecientlywiththeoptimizations essence,saformrequiresahugememoryusage,andisnotpracticalforrealprograms. restorationcodeshouldbehandledwithcare.thisisthepurposeofthethreefollowing sections. designmorepragmaticexpansionschemes:bothmemoryusageandrun-timedata-ow proposed.toavoidorreducethesesourcesofrun-timeoverhead,itisthusnecessaryto 5.2 stickwiththecompile-timerestorationoftheowofdatawhilekeepinginmindthe Thepresentsectionstudiesanovelmemoryexpansionparadigm:itsmotivationisto MaximalStaticExpansion function(associatedwithrun-timerestorationofthedata-ow).wewillshowthatthis toremoveasmanymemory-baseddependencesaspossible,withouttheneedofany goalrequiresachangeinthewayexpandeddatastructuresareaccessed,totakeinto approximativenatureofthecompile-timeinformation.moreprecisely,wewouldlike expansion[bcc98,bcc00].4thegoalistondautomaticallyastaticwaytoexpand alldatastructuresasmuchaspossible,i.e.themaximalstaticexpansion.maximalstatic accounttheapproximativeknowledgeofstoragemappings. expansionmaybeconsideredasatrade-obetweenparallelismandmemoryusage. Anexpansionofdatastructuresthatdoesnotneedafunctioniscalledastatic hapsconservative)outputofareachingdenitionanalysis,soourmethodis\optimal" program,withoutrestriction theonlyrestrictionsbeingthoseofyourfavoritereaching withrespecttotheprecisionofthisanalysis.ourframeworkisvalidforanyimperative Wepresentanalgorithmtoderivethemaximalstaticexpansion;itsinputisthe(per- denitionanalysis.wethenpresentanintra-proceduralalgorithmtoconstructthemaximalstaticexpansionforprogramswitharraysandscalarsonly,butwheresubscriptsand controlstructuresareunrestricted. 5.2.1 Thethreefollowingexamplesintroducethemainissuesandadvocateforamaximalstatic expansiontechnique. Motivation lutioncodes5.partsdenotedbyaresupposedtohavenoside-eect. Werststudythepseudo-codeshowninFigure5.6;thiskernelappearsinseveralconvo- FirstExample:DynamicControlFlow [CFR+91,KS98]maynotbestatic. lution. 4Noticethataccordingtoourdenition,anexpansioninthestaticsingle-assignmentframework 5Forinstance,HornandSchunck'salgorithmtoperform3DGaussiansmoothingbyseparableconvo-
5.2.MAXIMALSTATICEXPANSION... 169 Tfor(i=1;i<=N;i++){ doublex; SR x=; } =x; while() x=x;...figure5.6.firstexample... samei).therefore,iftheexpansionassignsdistinctmemorylocationstoht;iiandto denedeitherbyt,orbysomeinstanceofs,inthesameiterationoftheforloop(the anundenednumberoftimes(possiblyzero).thevaluereadinxbystatementristhus EachinstancehT;iiassignsanewvaluetovariablex.Inturn,statementSassignsx instancesofhs;i;wi,6howcouldinstancehr;ii\know"whichmemorylocationtoread from? AppliedtotheexampleinFigure5.6,ittellsusthattheset(hS;i;wi)ofdenitions thuscallthemappingfromareadinstancetoitssetofpossiblereachingdenitions. tionanalysiswhichdescribewherevaluesaredenedandwheretheyareused.wemay Wehavealreadyseenthatthisproblemissolvedwithaninstancewisereachingdeni- reachinginstancehs;i;wiis: Andtheset(hR;ii)ofdenitionsreachinginstancehR;iiis: (hs;i;wi)=ifw>1thenfhs;i;w 1igelsefhT;iig (5.1) wherewisanarticialcounterofthewhile-loop. Letustrytoexpandscalarx.OnewayistoconverttheprogramintoSA,makingT (hr;ii)=ht;ii [hs;i;wi:w1; (5.2) likenow?abrute-forceapplicationof(5.2)yieldstheprograminfigure5.7.whilethe once,complyingwiththedenitionofsa.however,whatshouldright-handsideslook right-handsideofsonlydependsonw,theright-handsideofrdependsonthecontrol writeintoxt[i]andsintoxs[i;w]:then,eachmemorylocationisassignedtoatmost ow,thusneedingafunction. dependenciesbetweendistinctiterationsoftheforloop.figure5.8showstheresulting butwithouthavingtoinsertfunctions. Theaimofmaximalstaticexpansionistoexpandxasmuchaspossibleinthisprogram maximalstaticexpansionofthisexample.ithasthesamedegreeofparallelismandis simplerthantheprograminsingle-assignment. Apossiblestaticexpansionistouniformlyexpandxintox[i]andtoavoidoutput illustratedinthefollowingexamples. etal.[mal93]tohandletheprograminfigure5.6;thiswouldtellusthatxcanbe privatizedalongi.however,wewanttodomorethanprivatizationalongloops,as NoticethatitshouldbeeasytoadaptthearrayprivatizationtechniquesbyMaydan 6Weneedavirtualloopvariablewtotrackiterationsofthewhileloop.
170... CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION Tfor(i=1;i<=N;i++){ S w=1; while(){ xt[i]= R } =(fht;iig[fhs;i;wi:w1g) w++; xs[i,w]=if(w==1)xt[i]elsexs[i,w-1]...figure5.7.firstexample,continued... }... ST for(i=1;i<=n;i++){ R while() x[i]= } =x[i] x[i]=x[i] SecondExample:ArrayExpansion...Figure5.8.Expandedversionoftherstexample... Letusgiveamorecomplexexample;wewouldliketoexpandarrayAintheprogramin Figure5.9. relationsbetweensinstances:anarrowfrom(i0;j0)to(i;j)meansthatinstance(i0;j0) denedbyaninstancehs;i0;j0iofswithj0n.figure5.9describesthedata-ow denesavaluethatmayreach(i;j). SinceTalwaysexecuteswhenjequalsN,avaluereadbyhS;i;ji,j>Nisnever... doublea[4*n]; Sfor(i=1;i<=2*N;i++) for(j=1;j<=2*n;j++){ if() 2Nj T } if(j==n)a[i+n]=; A[i-j+2*N]=A[i-j+2*N]; N...Figure5.9.Secondexample... N 2Ni
5.2.MAXIMALSTATICEXPANSION Formally,thedenitionreachinganinstanceofstatementSis:7 171 (hs;i;ji)=elsehs;i0;j0i:1i02n^n<j0<j^i0 j0=i j thenhs;i0;j0i:1i02n^1j0<j^i0 j0=i j ifjn [ht;i0;ni:1i0<i^i0=i j+n (5.3) Becausereachingdenitionsarenon-singletonsets,convertingthisprogramtoSAform wouldrequirerun-timecomputationofthememorylocationreadbys. 2N... j 2Nj N N samedataow Figure5.10.a.Instancesinvolvedinthe N 2Ni location Figure5.10.b.Countinggroupspermemory N 2Ni bygroupingtogetherinstancesinvolvedinthesamedataow.thesesubsetsbuilda...figure5.10.partitionoftheiterationdomain(n=4)... willnotbewrittennorreadbyinstancesoutsidethesubset.thepartitionisgivenin partitionoftheiterationdomain.eachsubsetmayhaveitsownmemoryspacethat However,wenoticethattheiterationdomainofSmaybesplitintodisjointsubsets accessedbyinstancesinthelargecentralsetinfigure5.10.b.letuslabelwith1the distinctsubsets.theseareallthearrayelementsa[c],1+nc3n 1.Theyare Figure5.10.a. subsetsinthelowerhalfofthisarea,andwith2thesubsetsinthetophalf.weaddone Usingthisproperty,wecanduplicateonlythoseelementsofAthatappearintwo dimensiontoarraya,subscriptedwith1and2instatementss2ands3infigure5.11, respectively.elementsa[c],1cnareaccessedbyinstancesintheupperlefttriangle infigure5.10.bandhaveonlyonesubseteach(onesubsetinthecorrespondingdiagonal toinstancesinthelowerrighttriangle. infigure5.10.a),whichwelabelwith1.thesamelabelingholdsforsetscorresponding therun-timeoverhead. thesamedegreeofparallelismasthecorrespondingsingle-assignmentprogram,without ThemaximalstaticexpansionisshowninFigure5.11.Noticethatthisprogramhas
172... CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION doublea[4*n,2]; for(i=1;i<=2*n;i++) for(j=1;j<=2*n;j++){ S1 if(-2*n+1<=i-j&&i-j<=-n){ //expansionofstatements }elseif(-n+1<=i-j&&i-j<=n-1){ if() if(j<=n){ A[i-j+2*N,0]=A[i-j+2*N,1]; S2 if() S3 }else if() A[i-j+2*N,0]=A[i-j+2*N,0]; S4 }else if() A[i-j+2*N,0]=A[i-j+2*N,0]; A[i-j+2*N,1]=A[i-j+2*N,1]; T } if(j==n)a[i+n,2]=; //expansionofstatementt......figure5.11.maximalstaticexpansionforthesecondexample... TdoubleA[N+1]; for(i=1;i<=n;i++){ RS for(j=1;j<=n;j++) } =A[bar(i)]; A[foo(i)]=; A[j]=; TdoubleA[N+1,N+1]; Sfor(i=1;i<=N;i++){ A[foo(i),i]=; for(j=1;j<=n;j++) A[j,i]=; Figure5.12.a.Sourceprogram Figure5.12.b.Expandedversion R} =A[bar(i),i]; ThirdExample:Non-AneArraySubscripts...Figure5.12.Thirdexample... ConsidertheprograminFigure5.12.a,wherefooandbararearbitrarysubscripting functions8.sinceallarrayelementsareassignedbyt,thevaluereadbyrattheith arraytotheexpandedone. Asaconsequence,theexpandedprograminFigure5.11shoudbeginwithacopy-incodefromtheoriginal iterationmusthavebeenproducedbysortatthesameiteration.thedata-owgraph 8A[foo(i)]standsforanarraysubscriptbetween1andN,\toocomplex"tobeanalyzedatcompiletime. 7SomeinstancesofSreaduninitializedvalues(e.g.whenj=1)andtheyhavenoreachingdenition.
5.2.MAXIMALSTATICEXPANSION issimilartotherstexample: 173 ThemaximalstaticexpansionaddsanewdimensiontoAsubscriptedbyi.Itissucient tomaketherstloopparallel. (hr;ii)=hs;ii [ht;i;ji:1jn: (5.4) WhatNext? Theseexamplesshowtheneedforanautomaticstaticexpansiontechnique.Wepresent programsintheirexpandedcounterparts,whichisaconvenientpropertyofouralgorithm. inthefollowingsectionaformaldenitionofexpansionandageneralframeworkfor maximalstaticexpansion.wethendescribeanexpansionalgorithmforarraysthat yieldstheexpandedprogramsshownabove.noticethatitiseasytorecognizetheoriginal assignmentformtransformation.however,privatizationgenerallyresortstodynamic restorationofthedataow,anditonlydetectsparallelismalongtheenclosingloops; staticexpansion:bothmethodsexposeparallelisminprogramsatalowercostthansingle- Itisnaturaltocomparearrayprivatization[MAL93,TP93,Cre96,Li92]andmaximal itisthuslesspowerfulthangeneralarrayexpansiontechniques.indeed,theexamplein spacebutmayalsodosome\blocking"alongthesediagonals. Section5.2.1showsthatourmethodnotonlymayexpandalongdiagonalsintheiteration Weassumeaninstancewisereachingdenitionanalysisisperformedpreviously,yielding aconservativeapproximationoftherelationbetweenusesandreachingdenitions. 5.2.2 ProblemStatement samememorylocation.ifweassigntwodistinctmemorylocationstovandwinthe tothesamesetofreachingdenitionsofsomereadu.supposetheybothwriteinthe avoiddynamicrestorationofthedataow.letusconsidertwowritesvandwbelonging Thedenitionofstaticexpansionhasrstbeenintroducedin[BCC98]:theideaisto denotedbyfe(v)=fe(w),and\uandwareassigneddistinctmemorylocationsinthe introducedinsections2.4and2.5,\vandwwriteinthesamememorylocation"is notknowwhichofthetwolocationshasthevalueneededbyu.usingthenotations expandedprogram,thenafunctionisneededtorestorethedataow,sincewedo expandedprogram"isdenotedbyfexp thatwedonotrequirethereachingdenitionanalysistogiveexactresults): WeintroducerelationRbetweendenitionsthatpossiblyreachthesameread(recall e(v)6=fexp e(w). cationintheoriginalprogram,theymuststillassignthesamememorylocationinthe Whenevertwodenitionspossiblyreachingthesamereadassignthesamememorylo- 8v;w2W:vRw()9u2R:vu^wu: details).relationr,therefore,generalizeswebs[muc97]toinstancesofreferences,and therestofthisworkshowshowtocomputerinthepresenceofarrays.9 expandedprogram.since\writinginthesamememorylocation"isanequivalencerelation,weactuallyuser,thetransitiveclosureofr(seesection5.2.4forcomputation 9Strictlyspeaking,websincludedenitionsanduses,whereasRappliestodenitionsonly.
174RelationRholdsbetweendenitionsthatreachthesameuse.Therefore,mapping CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION Denition5.1(staticexpansion)Foranexecutione2Eoftheprogram,anexpansionfromstoragemappingfetostoragemappingfexp 8v;w2We:vRw^fe(v)=fe(w)=)fexp ee(v)=fexp isstaticif benecessary,acaseastaticexpansionisdesignedtoavoid: thesewritestodierentmemorylocationsispreciselythecasewherefunctionswould pansionfromfetofexp Whenclearfromthecontext,wesay\staticexpansionfexp e".now,weareinterestedinremovingasmanydependencesas e"insteadof\staticex- e(w): (5.5) Denition5.2(maximalstaticexpansion)Foranexecutione,astaticexpansion (MSE),assigningthelargestnumberofmemorylocationswhileverifying(5.5): possible,withoutintroducingfunctions.wearelookingforthemaximalstaticexpansion Intuitively,iffexp fexp e ismaximalonthesetweofwrites,ifforanystaticexpansionf0e, memorylocationwhenfexp e8v;w2we:fexp ismaximal,thenf0ecannotdobetter:itmapstwowritestothesame e does. e(v)=fexp e(w)=)f0e(v)=f0e(w): (5.6) expansionfexp tellushowmucheachindividualmemorylocationshouldbeexpanded.thepurposeof Weneedtocharacterizethesetsofstatementinstancesonwhichamaximalstatic Section5.2.3istodesignapracticalexpansionalgorithmforeachmemorylocationused e(v)g.however,thishardlygivesusanexpansionscheme,becausethisresultdoesnot e isconstant,i.e.equivalenceclassesofrelationfu;v2we:fexp e(u)= intheoriginalprogram. Followingthelinesof[BCC00],weareinterestedinthestaticexpansionwhichremoves thelargestnumberofdependences. 5.2.3 FormalSolution Proposition5.1(maximalstaticexpansion)Givenaprogramexecutione,astoragemappingfexp onlyif 8v;w2We: e isbothamaximalstaticexpansionoffeandnerthanfeifand vrw^fe(v)=fe(w)()fexp e(v)=fexp Letfexp Proof:Sucientcondition the\if"part e(w) (5.7) Bydenition,fexp Letusshowthatfexp e beamappings.t.8u;v2w:fexp e isastaticexpansionandfexp e ismaximal.supposethatforu;v2w:fexp e(u)=fexp e isnerthanfe. e(v),urv^fe(u)=fe(v): f0esatisesf0e(u)=f0e(v)too.hence,fexp (5.7)impliesuRvandfe(u)=fe(v).Thus,from(5.5),anyotherstaticexpansion maximal. e(u)=fexp e(v))f0e(u)=f0e(v),sofexp e(u)=fexp e(v). Necessarycondition the\onlyif"part e is Letfexp sion,weonlyhavetoprovethat e beamaximalstaticexpansionnerthanfe.becausefexp 8u;v2W:fexp e(u)=fexp e(v)=)urv^fe(u)=fe(v): e isastaticexpan-
5.2.MAXIMALSTATICEXPANSION Ontheonehand,fexp theotherhand,forsomeuandvinw,assumefexp e(u)=fexp e(v))fe(u)=fe(v)becausefeisnerthanfe.on e(u)=fexp 175 showthatitcontradictsthemaximalityoffexp when:(urw),andf0e(w)=cwhenurw,forsomec6=fexp expansion:byconstruction,f0e(u0)=f0e(v0)foranyu0andv0suchthatu0rv0.the e:foranywinw,letf0e(w)=fexp e(v)and:(urv).we contradictioncomesfromthefactthatf0e(u)6=f0e(v). e(u).f0eisastatic e(w) fromscratchisanotherissue.toseewhy,consideramemorylocationcandtwoaccessesv Resultsabovemakeuseofageneralmemoryexpansionfexp e.however,constructingit andwwritingintoc.assumethatvrw:theseaccessesmustassignthesamememory locationintheexpandedprogram.nowassumethecontrary:if:(vrw),thenthe R.Notation(fe;)ismerelyabstract.Aconcretemethodforcodegenerationinvolves expansionshouldmakethemassigntwodistinctmemorylocations. addingdimensionstoarrays,andextendingarraysubscriptswith,seesection5.2.4. functionisconstructedbytheanalysisandmustbeconstantonequivalenceclassesof Wearethusstronglyencouragedtochooseanexpansionfexp eoftheform(fe;)where maximalstaticexpansioniffunctionsatisesthefollowingequation: Now,astoragemappingfexp 8e2E;8v;w2We;fe(v)=fe(w):vRw()(v)=(w): e =(fe;)isnerthanfebyconstruction,anditisa approximatefewithrelationandderivetwoconstraintsfromthepreviousequation: Inpractice,fe(v)=fe(w)canonlybedecidedwhenfeisane.Ingeneral,wehaveto Expansionmustbemaximal:8v;w2W:vw^:(vRw)=)(v)6=(w):(5.9) First,noticethatchangingintoitstransitiveclosurehasnoimpacton(5.8),and Expansionmustbestatic:8v;w2W: vw^vrw=)(v)=(w); (5.8) orsection5.4),buttheyseemmuchtwocomplicatedforourpurpose. thatthetransformedequationyieldsanequivalenceclassenumerationproblem.second, related.directmethodsexiststoaddressthesetwoproblemssimultaneously(see[coh99b] (5.9)isagraphcoloringproblem:itsaysthattwowritescannot\sharethesamecolor"if nottransitive onlyincontrivedexamples,e.g.withtrickycombinationsofaneand usingaconservativeapproximationharmsneitherthemaximalitynotthestaticpropertyoftheexpansion.actually,wefoundthatrelationdiersfrom meaningis Now,theonlypurposeofrelationistoavoidunnecessarymemoryallocation,and criterion: non-anearraysubscripts.therefore,considerthefollowingmaximalstaticexpansion Now,givenanequivalenceclassof,classesofRareexactlythesetswherestorage mappingfexp e isconstant: 8v;w2W;vw:vRw()(v)=(w) (5.10) Theorem5.1Astoragemappingfexp incrandtakesdistinctvaluesbetweendierentclasses:8v;w2c:vrw, (v)=(w). executione2eiforeachequivalenceclassc2w,isconstantoneachclass e =(fe;)isamaximalstaticexpansionforall applicationof(5.10)concludestheproof. CRisthesetofequivalenceclassesforrelationRonwritesinC.Astraightforward Proof:C2Wdenotesasetofwriteswhichmayassignthesamememorycell,and
176 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION Noticethatisonlysupposedtotakedierentvaluesbetweenclassesinthesame C:ifC1;C22WwithC16=C2,u12C1andu22C2,nothingpreventsthat (u1)=(u2). Asaconsequence,twomaximalstaticexpansionsfexp eandf0eareidenticalonaclassof W,uptoaone-to-onemappingbetweenconstantvalues.Aninterestingresultfollows: Lemma5.1TheexpansionfactorforeachmemorylocationassignedbywritesinCis Card(CR). LetCbeanequivalenceclassinW(statementinstancesthatmayhitthesame memorylocation).supposewehaveafunctionmappingeachwriteuinctoarepresentativeofitsequivalenceclassinc(seesection5.2.4fordetails).onemaylabel eachclassincr,orequivalently,labeleachelementof(c).suchalabelingschemeis obviouslyarbitrary,butallprogramstransformedusingourmethodareequivalentupto apermutationoftheselabels.labelingboilsdowntoscanningexactlyoncealltheinteger pointsinthesetofrepresentatives(c),seesection5.2.5fordetails.now,remember thatfunctionfexp e isoftheform(fe;).fromtheorem5.1,wecantakefor(u)the labelwechoosefor(u),thenstoragemappingfexp e isamaximalstaticexpansionforour program. Eventually,onehastogeneratecodefortheexpandedprogram,usingstoragemapping fexp e.itisdoneinsection5.2.4. 5.2.4 Algorithm Themaximalstaticexpansionschemegivenaboveworksforanyimperativeprogram. Moreprecisely,youmayexpandanyimperativeprogramusingmaximalstaticexpansion, providedthatareachingdenitionanalysistechniquecanhandleit(attheinstancelevel) andthattransitiveclosurecomputation,relationcomposition,intersectionandunionare feasibleinyourframework. Inthesequel,sinceweuseFADA(see[BCF97,Bar98]andSection2.4.3)asreaching denitionanalysis,weinherititssyntacticalrestrictions:datastructuresarescalarsand arrays;pointersarenotallowed.loops,conditionalsandarraysubscriptsareunrestricted. Therefore,Maximal-Static-ExpansionandMSE-Convert-Quastarebasedonthe classicalsingle-assignmentalgorithmsforloopnests,seesection5.1.theyrelyonomega [KPRS96]andPIP[Fea88b]forsymboliccomputations.AdditionalalgorithmsandtechnicalpointsarestudiedinSection5.2.5.InMaximal-Static-Expansion,thefunction mappinginstancestotheirrepresentativedisencodedasananerelationbetweeniterationvectors(augmentedwiththestatementlabel),andlabelingfunctionisencoded asananerelationbetweenthesameiterationvectorsanda\compressed"vectorspace foundbyenumerate-representatives,seesection5.2.5. Aninterestingbuttechnicalremarkisthat,byconstructionoffunction seenasa parameterizedvector,afewcomponentsmaytakeanite andhopefullysmall number ofvalues.indeed,suchcomponentsmayrepresentthe\statementpart"ofaninstance. Insuchcase,splittingarrayAintoseveral(renamed)datastructures10shouldimprove performanceanddecreasememoryusage(avoidingconvexhullsofdisjointpolyhedra). ConsiderforinstanceMSEofthesecondexample:expandingAintoA1andA2would require6n 2arrayelementsinsteadof8N 2inFigure5.11.Othertechniquesreducing 10Recallthatinsingle-assignmentform,statementsassigndisjoint(renamed)datastructures.
5.2.MAXIMALSTATICEXPANSION Maximal-Static-Expansion(program;;) program:anintermediaterepresentationoftheprogram 177 :theconictrelation 2R 1 :thereachingdenitionrelation,seenasafunction 3 returnsanintermediaterepresentationoftheexpandedprogram 4 Compute-Representatives(\R) Transitive-Closure() 5foreacharrayAinprogram Enumerate-Representatives(;) Transitive-Closure( 1) 6doA 978 doleft-handsidea[subscript]ofsisreplacedbyaexp[subscript;(curins)] declarationa[shape]isreplacedbyaexp[shape,a] foreachstatementsassigningainprogram component-wisemaximumof(u)forallwriteaccessesutoa 13 12 10 11 do=ref foreachreadreferencereftoainprogram quast restrictionoftoaccessesoftheform({;ref) 16returnprogram 14 15 map ref map(curins) MSE-Convert-Quast(quast;ref) Make-Quast(=ref) MSE-Convert-Quast(quast;ref) quast:thequastrepresentationofthereachingdenitionfunction ref:theoriginalreference 31switch 2returnstheimplementationofquastasavalueretrievalcodeforreferenceref casequast=f?g: 654 casequast=f{g: A returnref 78 S x Iter({) Stmt({) Array({) 10 11 9 casequast=f{1;{2;:::g: returnaexp[subscript;x] error\thiscaseshouldneverhappenwithstaticexpansion!" originalarraysubscriptinref 13 12 casequast=ifpredicatethenquast1elsequast2: returnifpredicatemse-convert-quast(quast1;ref) thenumberofuselessmemorylocationsallocatedbyouralgorithmarenotdescribedin elsemse-convert-quast(quast2;ref) thispaper. 5.2.5 Thissectionisdevotedtotheiranalysisandresolution. Afewtechnicalpointsandcomputationalissuesareraisedinthepreviousalgorithm. DetailedReviewoftheAlgorithm
FindingRepresentativesforEquivalenceClasses 178 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION rstexperimentsgavegoodresults. lexicographicminimumbecauseitcanbecomputedusingclassicaltechniques,andour Findinga\good"canonicalrepresentativeinasetisnotasimplematter.Wechoosethe Therefore,thegood\parametric"propertiesoflexicographicalminimumcomputations [Fea91,Pug92]arewellsuitedtoourpurpose. Noticealsothatrepresentativesmustbedescribedbyafunctiononwriteinstances. Cis: equivalencerelation,andcanequivalenceclassfor.thelexicographicalminimumof Ageneraltechniquetocomputethelexicographicalminimumfollows.Letbean Since<lexisarelation,wecanrewritethedenitionusingalgebraicoperations: <lex(c) =v2cs:t:@u2c;u<lexv: ThisisappliedinourframeworktoclassesofRandwithorder<seq. min <lex(c) = n(<lex)(c): (5.11) Compute-Representatives(equivalence) 1repres 2returnrepres equivalence:ananeequivalencerelationoverinstances returnsananefunctionmappinginstancestoacanonicalrepresentative ApplyingAlgorithmCompute-RepresentativestorelationRyieldsanane equivalencen(<seqequivalence) function,butthisdoesnotreadilyprovidethelabelingfunction.thelaststep consistsinenumeratingtheimageofinsideclassesofequivalencerelation. Tolabeleachmemorylocation,weassociateeachlocationtoanintegerpointintheane polyhedronofrepresentatives,i.e.theimageoffunctionwhoserangeisrestrictedto ComputingaDenseLabeling aclassofequivalencerelation.labelingboilsdowntoscanningexactlyonceallthe integerpointsinthesetofrepresentatives.thiscanbedoneusingclassicalpolyhedronscanningtechniques[ai91,cfr95]orsimplybyconsideringa\part"oftherepresentative shouldbeasdenseaspossible,meaningthatthenumberofmemorylocationsaccessed.butcomputinga\good"labelingfunctionismuchmoredicult:a\good"labeling functioninone-to-onemappingwiththisset.itisthuseasytocomputealabelingfunction bytheprogrammustbeasnearaspossibleasthenumberofmemorylocationsallocated fromtheshapeoffunction. timesgenerateveryintricatesubscripts;moreover,mostcompile-timepropertieson,thankstoerhartpolynomials[cla96],andtobuildalabeling(non-aneingeneral) fromthiscomputation.butthiswouldbeextremelycostlyinpracticeandwouldsome- Apossibleideawouldbetocountthenumberofintegerpointsintheimageoffunction caseisleftforfuturework. andrajopahye[wr93],butstudyingapplicabilityoftheirtechniquetoourmoregeneral lem"ismostlyopenatthemoment.wehavefoundaninterestingpartialresultbywilde wouldbelost,duetothepossiblenon-aneform.asaresult,the\denselabelingprob-
5.2.MAXIMALSTATICEXPANSION Manysimpletransformationscanbeappliedtotocompressitsimage.Thanks 179 translation,divisionbyanintegerconstant whenaconstantstrideisdiscovered and totheregularityofiterationspacesofpracticalloopnests,techniquessuchasglobal afunctionwhoserangeisrestrictedtoaclassofsomeequivalencerelation. Representativesimplementsthesesimpletransformationstoenumeratetheimageof projectiongaveexcellentresultsoneveryexamplewestudied.algorithmenumerate- Enumerate-Representatives(rel;fun) rel:equivalencerelationwhoseclassesdeneenumerationdomains fun:theanefunctionwhoseimageshouldbeenumerated 3applyappropriatetranslations,divisionsandprojectionstoiterationvectorsinenum 2enum 1repres returnsadenselabelingoftheimageoffunrestrictedtoaclassofrel 4returnenum Symbolic-Vector-Subtract(fun;represfun) Compute-Representatives(rel) WhataboutComplexityandPracticalUse? Foreacharrayinthesourceprogram,thealgorithmproceedsasfollows: Computethereciprocalrelation 1of.Thisisdierentfromcomputingthe Composingtworelationsand0boilsdowntoeliminatingyinxy^y0z. inverseofafunctionandconsistsonlyinaswapofthetwoargumentsof. ComputingtheexacttransitiveclosureofRorisimpossibleingeneral:Presburger approximations(ifnotexactresults)canbecomputed.kellyetal.[kprs96]donot arithmeticisnotclosedundertransitiveclosure.however,verypreciseconservative giveaformalboundonthecomplexityoftheiralgorithm,buttheirimplementation algorithmispresentedinsection3.1.2.noticeagainthattheexacttransitiveclosure intheomegatoolkitprovedtobeecientifnotconcise.ashortreviewoftheir isnotnecessaryforourexpansionschemetobecorrect. Moreover,Randhappenstobetransitiveinmostpracticalcases.Inourimplementation,theTransitive-Closurealgorithmrstcheckswhetherthedierence Inthealgorithmabove,isalexicographicalminimum.Theexpansionschemejust relationsrandarealreadytransitive. (RR)nRisempty,beforetriggeringthecomputation.Inallthreeexamples,both Finally,numberingclassesbecomescostlyonlywhenwehavetoscanapolyhedral icalminimumisexpensiveapriori,butwaseasytoimplement. needsawaytopickoneelementperequivalenceclass.computingthelexicograph- IsourResultMaximal? onourbenchmarkexamples. setofrepresentativesindimensiongreaterthan1.inpractice,weonlyhadintervals oftheoriginalprogramstoragemapping.wewouldliketostressthefactthatthe Ourexpansionschemedependsonthetransitiveclosurecalculator,andofcourseonthe accuracyofinputinformation:instancewisereachingdenitionsandapproximation
expansionproducedisstaticandmaximalwithrespecttotheresultsyieldedbythese 180 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION parts,whatevertheiraccuracy: Theexacttransitiveclosuremaynotbeavailable(forcomputabilityorcomplexityreasons)andmaythereforebeover-approximated.Theexpansionfactorofa memorylocationcisthenlowerthancard(fu2w:fe(u)=cgr).however,the tothealgorithm. expansionremainsstaticandismaximalwithrespecttothetransitiveclosuregiven Relationapproximatingthestoragemappingoftheoriginalprogrammaybe moreaccuratetherelation,thelessunusedmemoryisallocatedbytheexpanded pointdoesnotinterferewiththestaticityormaximalityoftheexpansion;butthe program. moreorlessprecise,butwerequiredittobepessimistic(a.k.a.conservative).this Despitegoodperformanceresultsonsmallkernels(seefollowingsections),itisobvious 5.2.6 thatreachingdenitionanalysisandmsewillbecomeunacceptablyexpensiveonlarger ApplicationtoRealCodes optimizationtechniquesisvaluable.suchtechniqueshavebeeninvestigatedbyberthou gorithmindependentlytoseveralloopnests.aparallelizingcompiler(oraproler)can codes.whenaddressingrealprograms,itisthereforenecessarytoapplythemseal- in[ber93],andalsointhepolaris[bef+96]andsuif[h+96]projects. isolateloopneststhatarecriticalprogrampartsandwherespendingtimeinpowerful ofpossiblereachingdenitionsforsomereadaccessesisnotasingletonandincludes?, valuesthatmaybereadbysuchaccessesmustbecopiedintotheappropriateexpanded itisnecessarytoperformsomecopy-inatthebeginningofthecode.eacharrayholding However,somevaluesmaybeinitializedoutsideoftheanalyzedcode.Whentheset values.however,theprocessisfullyparallelandcanhopefullynotcostmorethanthe loopnestitself. arrays.inpracticethisisexpensivewhenexpandedarraysholdmanycopiesoforiginal thememorylocationaccessedisunknown).sinceall?shavebeenremoved,computing inthereachingdenitionrelationbytheappropriatevirtualaccess(accessesindeed,when Itconsistsinadding\virtualwriteaccesses"foreverymemorylocationandreplacing?s Thereisasimplewaytoavoidcopy-in,tothecostofsomelossintheexpansiondegree. themaximalstaticexpansionfromthismodiedreachingdenitionrelationrequiresno copy-in;butadditionalconstraintsduetothe\virtualaccesses"mayforbidsomearray inaloopnest.butitsapplicationtothesecondmotivatingexample(figure5.9)would expansions.thistechniqueisespeciallyusefulwhenmanytemporaryarraysareinvolved theaccessestothesameoriginalarraymaynowbeinconsistent.considerforinstancethe forbidallexpansionsincealmostallreadsmayaccessvaluesdenedoutsidethenest. bymse,andthesecondnestbyanytechnique.thecodeappearsinfigure5.13.b. originalpseudocodeinfigure5.13.a.weassumetherstnestwasprocessedseparately Moreover,thedatastructurescreatedbyMSEoneachloopnestmaybedierent,and whichtheoriginalstructureisrestored(seefigure5.13).doingthisonlyrequirestoadd, knowwhich1toreadfrom. Clearly,referencestoAmaybeinconsistent:areadreferenceinthesecondnestdoesnot attheendoftherstnest,\virtualaccesses"thatreadseverymemorylocationswritten Asimplesolutionisthentoinsert,betweenthetwoloopnests,acopy-outcodein
5.2.MAXIMALSTATICEXPANSION 181... fori A[f1(i)] endfor fori =A[f2(i)] endfor Figure5.13.a.Originalcode fori A1[f1(i),1(i)] endfor fori =A1[f2(i),/*unknown*/] endfor Figure5.13.b.MSEversion fori A1[f1(i),1(i)] endfor forc//copy-outcode A[c]=A1[c,1(())] endfor fori =A[f2(i)] endfor Figure5.13.c.MSEwithcopy-out...Figure5.13.Insertingcopy-outcode... inthenest.thereachingdenitionswithinthenestgivetheidentityofthememory locationtoreadfrom.noticethatnofunctionsarenecessaryinthecopycode the oppositewouldleadtoanon-staticexpansion.moreprecisely,ifwecallv(c)the\virtual access"tomemorylocationcaftertheloopnest,wecancomputethemaximalstatic expansionforthenestandtheadditionalvirtualaccesses,andthevaluetocopybackinto cislocatedin(c;((v(c)))). Fortunately,withsomeknowledgeontheprogram-wideowofdata,severaloptimizationscanremovethecopy-outcode11.Thesimplestoptimizationistoremovethe copy-outcodeforsomedatastructurewhennoreadaccessexecutingafterthenestusesa valueproducedinsidethisnest.thecopy-outcodecanalsoberemovedwhennofunctionsareneededinreadaccessesexecutingafterthenest.eventually,itisalwayspossible toremovethecopy-outcodeinperformingaforwardsubstitutionof(c;((v(c))))into readaccessestoamemorylocationcinfollowingnests. 5.2.7 BacktotheExamples Thissectionappliesouralgorithmtothemotivatingexamples,usingtheOmegaCalculator[Pug92]asatooltomanipulateanerelations. 11Letusnoticethat,ifMSEisusedincodesign,theintermediatecopy-codeandassociateddata structureswouldcorrespondtoadditionallogicandbuers,respectively.bothshouldbeminimizedin complexityand/orsize.
FirstExample 182 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION ConsideragaintheprograminFigure5.6page169.UsingtheOmegaCalculatortextbasedinterface,wedescribeastep-by-stepexecutionoftheexpansionalgorithm.We havetocodeinstancesasinteger-valuedvectors.aninstancehss;iiisdenotedbyvector [i,..,s],where[..]possiblypadsthevectorwithzeroes.wenumbert;s;rwith1, 2,3inthisorder,sohT;ii,hS;i;jiandhR;iiarewritten[i,0,1],[i,j,2]and[i,0,3], respectively. S:={[i,1,2]->[i,0,1]:1<=i<=N} From(5.1)and(5.2),weconstructtherelationSofreachingdenitions: union{[i,w,2]->[i,w-1,2]:1<=i<=n&&2<=w} relatedtogether,andcanbeomitted. Sincewehaveonlyonememorylocation,relationtellsusthatallinstancesare union{[i,0,3]->[i,0,1]:1<=i<=n} ComputingRisstraightforward: union{[i,0,3]->[i,w,2]:1<=i<=n&&1<=w}; R; S':=inverseS; {[i,0,1]->[i,0,1]:1<=i<=n}union R:=S(S'); {[i,w,2]->[i,0,1]:1<=i<=n&&1<=w}union {[i,0,1]->[i,w',2]:1<=i<=n&&1<=w'}union {[i,w,2]->[i,w',2]:1<=i<=n&&1<=w'&&1<=w} Inmathematicalterms,weget: hs;i;wirhs;i;w0i()1in;w1;w01 hs;i;wirht;ii()1in^w1 ht;iirht;ii()1in RelationRisalreadytransitive,noclosurecomputationisnecessary: ht;iirhs;i;w0i()1in^w01 R=R (5.12) (theleastinstanceaccordingtothesequentialorder):(u)=min<seq(fu0:u0rug).we maycomputethisexpressionusing(5.11): Thereisonlyoneequivalenceclassfor. Letuschoose(u)astherstexecutedinstanceintheequivalenceclassofuforR accessestovariablexrequiresnmemorylocations.here,iisanobviouslabel: Computing(W)yieldsNinstancesoftheformhT;ii.Maximalstaticexpansionof 8i;w;1iN;w1:(hT;ii)=hT;ii;(hS;i;wi)=hT;ii: righthandsidearetransformedintox[i]toosincetheirreachingdenitionsareinstances Allleft-handsidereferencestoxaretransformedintox[i];allreferencestoxinthe 8i;w;1iN;w1:(hS;i;wi)=(hT;ii)=i: (5.13) ofsortforthesamei.theexpandedcodeisthusexactlytheonefoundintuitivelyin Figure5.8. Thesizedeclarationofthenewarrayisx[1..N].
5.2.MAXIMALSTATICEXPANSION SecondExample 183 by[i,j,1]and[i,n,2],respectively. WenowconsidertheprograminFigure5.9.InstanceshS;i;jiandhT;i;Niaredenoted S:={[i,j,1]->[i',j',1]:1<=i,i'<=2N&&1<=j'<j<=N&&i'-j'=i-j} From(5.3),therelationSofreachingdenitionsisdenedas: union{[i,j,1]->[i',j',1]:1<=i,i'<=2n&&n<j'<j<=2n&&i'-j'=i-j} SorT,whoseiterationvectorsare(i;j)and(i0;j0)writeinthesamememorylocation Itiseasytocomputerelationsinceallarraysubscriptsareane:twoinstancesof union{[i,j,1]->[i',n,2]:1<=i,i'<=2n&&n<j<=2n&&i'=i-j+n}; syntax: May:={[i,j,s]->[i',j',s']:1<=i,j,i',j'<=2N&&i-j=i'-j'&& ii j=i0 j0.thisrelationistransitive,hence=.wecallitmayinomega's S':=inverseS; Asintherstexample,wecomputerelationRusingOmega: (s=1 (s=2&&j=n) s'=1 (s'=2&&j'=n))}; R:=S(S'); R; {[i,j,1]->[i',j-i+i',1]:1<=i<=2n-1&&1<=j<n&&1<=i'<=2n-1 {[i,j,1]->[i',j-i+i',1]:n<j<=2n-1&&1<=i<=2n-1&&1<=i'<=2n-1 {[i,n,2]->[i',n-i+i',1]:1<=i<i'<=2n-1&&i'<n+i}union &&i<j+i'&&j+i'<n+i}union {[i,j,1]->[n+i-j,n,2]:n<j<=2n-1&&i<=2n-1&&j<n+i}union {[i,n,2]->[i,n,2]:1<=i<=2n-1} &&N+i<j+i'&&j+i'<2N+i}union Thatis: ht;i;nirht;i;ni,1i2n 1 hs;i;jirhs;i0;j0i,(1i;i02n 1)^(i j=i0 j0) ht;i;nirhs;i0;n i+i0i,1i<i02n 1^i0<N+i hs;i;jirht;n+i j;ni,(1i2n 1)^(N<j2N 1)^(j<N+i) ^ 1j;j0<N_N<j;j0<2N 1 ofr. i j=kg[fht;k+n;nig.now,foru2c,(u)=min<seq(fu02w:u0u^u0rug). LetCbeanequivalenceclassforrelation.Thereisanintegerks.t.C=fhS;i;ji: RelationRisalreadytransitive:R=R.Figure5.10.ashowstheequivalenceclasses Then,wecompute(u)usingOmega: 1 Ni jn 1^j>=N:(hS;i;ji)=hT;i;Ni 1 Ni jn 1^j<N:(hS;i;ji)=hS;i j+1;1i 1 2Ni j N:(hS;i;ji)=hS;1;1 i+ji Ni j2n 1:(hS;i;ji)=hS;i j+1;1i 1i2N 1:(hT;i;Ni)=hT;i;Ni
184TheresultshowsthreeintervalsofconstantcardinalityofCR;theyaredescribedin CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION isonlyonerepresentative,thus(hs;i;ji)=1.if1 Ni jn 1,therearetwo Figure5.10.b.Alabelingcanbefoundmechanically.Ifi j Nori jn,there representatives;thenwedene(hs;i;ji)=1ifjn,(hs;i;ji)=2ifj>n,and (ht;i;ni)=2. tionalsinhavebeentakenoutofarraysubscripts. indeningtwodierentarrays:a1standingfora[;0]holding4n 1elements,and ThestaticexpansioncodeappearsinFigure5.11.AshintedinSection5.2.4,condi- A2standingforA[;1]holdingonly2N 1elements.Thisideawaspointedoutin Section5.2.4. ArrayAisallocatedasA[4*N,2].Notethatsomememorycouldhavebeenspared WecomebacktotheprograminFigure5.12.a.InstanceshT;i;ji,hS;iiandhR;iiare written[i,j,1],[i,0,2]and[i,0,3]. ThirdExample:Non-AneArraySubscripts S:={[i,0,3]->[i,j,1]:1<=i,j<=N} From(5.4),webuildtherelationofreachingdenitions: relationbetweeninstanceswritinginsomelocationa[x].wecanonlymakethefollowing Sincesomesubscriptsarenonane,wecannotcomputeatcompile-timetheexact union{[i,0,3]->[i,0,2]:1<=i<=n}; pessimisticapproximationof:allinstancesarerelatedtogether(becausetheymayassign thesamememorylocation). R; R:=S(S'); S':=inverseS; {[i,j,1]->[i,j',1]:1<=i<=n&&1<=j<=n {[i,0,2]->[i,0,2]:1<=i<=n} {[i,j,1]->[i,0,2]:1<=i<=n&&1<=j<=n}union {[i,0,2]->[i,j',1]:1<=i<=n&&1<=j'<=n}union Risalreadytransitive:R=R. Thereisonlyoneequivalenceclassfor. Wecompute(u)usingOmega: NotethateveryhT;i;jiinstanceisinrelationwithhT;i;1i. 8i;j;1iN;1jN:(hT;i;ji)=hT;i;1i 8i;1iN:(hS;ii)=hT;i;1i representatives;thustheresultingfunctionis: ofaccessestovariablexrequiresnmemorylocations.wecanuseitolabelthese Computing(W)yieldsNinstancesoftheformhT;ii.Maximalstaticexpansion (hs;ii)=(ht;i;ji)=i:
5.2.MAXIMALSTATICEXPANSION Usingthislabeling,alllefthandsidereferencestoA[]becomeA[,i]inthe 185 theintuitiveresultgiveninfigure5.12.b. i,therighthandsideofrisexpandedthesameway.expandingthecodethusleadsto expandedcode.sincethesourceofhr;iiisaninstanceofsortatthesameiteration 5.2.8 ThesizedeclarationofAisnowA[N+1,N+1]. WeranafewexperimentsonanSGIOrigin2000,usingthemplibrary.Implementation issuesarediscussedinsection5.2.9. Experiments memberthatwisanarticialcounterofthewhile-loop,andmisthemaximumnumber Fortherstexample,theparallelSAandMSEprogramsaregiveninFigure5.14.Re- PerformanceResultsfortheFirstExample canbecomputedatlowcost:itrepresentsthelastiterationoftheinnerloop. ofiterationsofthisloop.wehaveseenthatafunctionisnecessaryforsaform,butit... TdoublexT[N],xS[N,M]; parallelfor(i=1;i<=n;i++){ S w=1; while(){ xt[i]=; } w++; xs[i][w]=if(w==1)xt[i]; doublex[n+1]; R //thelasttwolinesimplement =if(w==1)xt[i]; elsexs[i,w-1]; ST parallelfor(i=1;i<=n;i++) R while() x[i]=; =x[i]; x[i]=x[i]; Figure5.14.a.Single-assignment }//(fht;iig[fhs;i;wi:1wmg) Figure5.14.b.Maximalstaticexpansion} totheoriginalsequentialprogram,thenspeed-upsforthemseversionrelativetothe...figure5.14.parallelizationoftherstexample... single-assignmentform.asexpected,mseshowsabetterscaling,andtherelativespeedupquicklygoesover2.moreover,forlargermemorysizes,thesaprogrammayswapor TableinFigure5.15rstdescribesspeed-upsforthemaximalstaticexpansionrelative failforlackofmemory. ure5.16summarizesthecomputationtimesforourexamples(ona32mbsunsparc- station5).theseresultsdonotincludethecomputationtimesforreachingdenition 5.2.9 ThemaximalstaticexpansionisimplementedinC++ontopoftheOmegalibrary.Fig- Implementation analysisandcodegeneration.
186... CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION Conguration200250200500200100020020002004000 Speed-upsforMSEversusoriginalprogram MN 16processors 32processors 6.72 5.75 9.79 9.87 12.8 15.3 13.4 21.1 14.7 16processors 1.43Speed-upsforMSEversusSA 1.63 1.79 1.96 24.8 32processors 1.16 1.33 1.52 1.80 2.07...Figure5.15.Experimentalresultsfortherstexample... 1.99... transitive closure1stexample2ndexample3rdexample pickingthe (check) 100 100 110 representatives (function) other 110 130 160 150 110 total 340 410 290 70...Figure5.16.Computationtimes,inmilliseconds... minimum.theintuitionbehindtheseresultsisthatthecomputationtimemainlydepends onthenumberofaneconstraintsinthedata-owanalysisrelation. tocomputefunction(mappinginstancestotheirrepresentatives)usingalexicographical Moreover,computingtheclassrepresentativesisrelativelyfast;itvalidatesourchoice transitiveclosureswhentheybecomelarge. Ouronlyconcern,sofar,wouldbetondawaytoapproximatetheexpressionsof timeoverhead.parallelizationviamemoryexpansionthusrequiresbothmoderationin 5.3 Memoryexpansiontechniqueshavetwomaindrawbacks:highmemoryusageandrun- StorageMappingOptimization code. theexpansiondegreeandeciencyintherun-timecomputationofdata-owrestoration donotinterferewithparallelismextraction.thissectionaddressessuchoptimization constraints"suchastheonepresentedinsection5.2orwithoptimizationtechniquesthat Moderationintheexpansiondegreecanbeaddressedintwoways:eitherwith\hard
5.3.STORAGEMAPPINGOPTIMIZATION techniques,andpresentsthemainresultsofacollaborationwithvincentlefebvre.itcan 187 beseenasanextensionofaworkbyfeautrierandlefebvre[lf98]andalsobystroutet al.[scfs98]. ping,accordingtoagivenparallelexecutionorder,foranynestofloopswithunrestricted conditionalexpressionsandarraysubscripts;weshowthatschedule-independentstorage mappingsdenedin[scfs98]correspondtocorrectstoragemappingsaccordingtothe Ourcontributionsarethefollowing:weformalizethecorrectnessofastoragemap- dependencegraphs(i.e.capturedbypresburgerarithmetics). applicabletoanynestofloopsandtoallparallelizationtechniquesbasedonpolyhedral data-owexecutionorder;andwepresentanalgorithmforstoragemappingoptimization, 5.3.1 FirstExample:DynamicControlFlow WerststudythekernelinFigure5.17.a,whichwasalreadytherstmotivatingexample Motivation insection5.2.partsdenotedbyhavenoside-eect.eachloopiterationspawns instancesofstatementsincludedintheloopbody.... Tfor(i=1;i<=N;i++){ doublex; S x=; while(){ doublext[n+1],xs[n+1,m+1] R} =x; } x=x; parallelfor(i=1;i<=n;i++){ ST xt[i]=; w=1; while(){ xs[i][w]=if(w=1)xt[i]; Figure5.17.a.Originalprogram R =if(w==1)xt[i]; } w++; elsexs[i,w-1]; }//thelasttwolinesimplement //(fht;iig[fhs;i;wi:1wmg) elsexs[i,w-1]; TdoublexTS[N+1] parallelfor(i=1;i<=n;i++){ xts[i]=; Figure5.17.b.Single-assignment SR while(){ } =xts[i]; } xts[i]=xts[i]; Figure5.17.c.Partialexpansion...Figure5.17.Convolutionexample... Anyinstancewisereachingdenitionanalysisissuitabletoourpurpose,butFADA
[BCF97]ispreferedsinceithandlesanyloopnestandachievestoday'sbestprecision. 188 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION resultsforreferencesxinright-handsideofrandsarenestedconditionals: Value-baseddependenceanalysis[Won95]isalsoagoodchoice.Inthefollowing,The (hs;i;w;xi)=ifw=1thenftgelsefhs;i;w 1ig tiling.weneedtoexpandscalarxandremoveasmanyoutput,owandanti-dependences Here,memory-baseddependenceshampersdirectparallelizationviaschedulingor (hr;i;xi)=fhs;i;wi:1wg: aspossible.reachingdenitionanalysisisatthecoreofsingle-assignment(sa)algorithms,sinceitrecordsthelocationofvaluesinexpandeddatastructures.howevertimerestorationofvalues[cfr+91,col98].figure5.17.bshowsourprogramconverted whentheowofdataisunknownatcompile-time,functionsareintroducedforrun- representsthelastiterationoftheinnerloop. oftheinnerloop).afunctionisnecessarybutcanbecomputedatlowcostsinceit tosaform,withtheouterloopmarkedparallel(misthemaximumnumberofiterations array.optimizingmemoryusageisthusacriticalpointwhenapplyingmemoryexpansion techniquestoparallelization. SAprogramssuerfromhighmemoryrequirements:SnowassignsahugeNM aone-dimensionalarrayissucientsincetheinnerloopisnotparallel.asaside-eect,no functionisneededanymore.storagerequirementisn,tobecomparedwithnm+nin beforetheinnerloopintheparallelversion,sandtmayassignthesamearray.moreover Figure5.17.cshowstheparallelprogramafterpartialexpansion.SinceTexecutes thesaversion,andwith1intheoriginalprogram(allowingnolegalparallelreordering). (a.k.a.universal)storagemapping,inthesenseof[scfs98].onmanyprograms,amore sincetheinnerloopcannotbeparallelized.wehavethusbuiltaschedule-independent However,itiseasytoshowthatitisalsocompatiblewithallotherexecutionorders, Thispartialexpansionhasbeendesignedforaparticularparallelexecutionorder. only. compatiblewithanylegalexecutionorder.thisisdonein[lf98]foraneloopnests toagivenparallelexecutionorder,insteadofndingaschedule-independentstorage memory-economicaltechniqueconsistsincomputingalegalstoragemappingaccording SecondExample:aMoreComplexParallelization WenowconsidertheprograminFigure5.18whichsolvesthewellknownknapsackproblem(KP).Thiskernelnaturallymodelsseveraloptimizationproblems[MT90].Intuitively: Misthenumberofobjects,Cisthe\knapsack"capacity,W[k](resp.P[k])istheweight... thecapacity.instancesofsaredenotedbyhs;k;w[k]i,:::,hs;k;ci,for1km. (resp.prot)ofobjectnumberk;theproblemistomaximizetheprotwithoutexceeding Sfor(k=1;k<=M;k++) inta[c+1],w[m+1],p[m+1]; for(j=w[k];j<=c;j++)...figure5.18.knapsackprogram... A[j]=max(A[j],P[k]+A[j-W[k]]);
5.3.STORAGEMAPPINGOPTIMIZATION Wesuppose(fromadditionalstaticanalyses)thatW[k]isalwayspositiveandlessthan 189 orequaltoanintegerk.theresultforreferencesa[j]anda[j-w[k]]inright-hand sideofsareconditionals: (hs;k;j;a[j-w[k]]i)=fhs;k0;j0i:1k0k^max(0;j K)<j0<j 1g (hs;k;j;a[j]i)= ifk=1 thenf?g FirstnoticethatprogramKPdoesnothaveanyparallelloops,andthatmemorybaseddependenceshampersdirectparallelization.Therefore,parallelizingKPrequires theapplicationofpreliminaryprogramtransformations. vertedtosaform.theuniquefunctionimplementsarun-timechoicebetweenvalues producedbyfhs;k0;j0i:1k0k^max(0;j K)<j0<j 1g,forsomereadaccess hs;k;j;a[j-w[k]]i. Thankstothereachingdenitioninformation,Figure5.19showsprogramKPcon- elsefhs;k 1;jig... intas[m+1,c+1] inta[c+1],w[m+1],p[m+1] Sfor(k=1;k<=M;k++) for(j=w[k];j<=c;j++) AS[k,j]=if(k==1) else max(a[j],p[1]+a[j-w[1]]); max(as[k-1,j],...figure5.19.kpinsingle-assignmentform... P[k]+(fhS;k0;j0i:1k0k^max(0;j K)<j0<j 1g); ofa[j-w[k]]hasbeen\moved"bysaformtransformation\to"as[k,j-w[k]].then (fhs;k0;j0i:1k0k^max(0;j K)<j0<j 1g)isequaltoAS[k,j-W[k]]. Thisoptimizationavoidstheuseoftemporaryarrays.Itcanbeperformedautomatically, Eventually,inthisparticularcase,thefunctionisreallyeasytocompute:thevalue alongwithotherinterestingoptimizations,seesection5.1.4. alegalparallelscheduleforprogramkpis:\executeinstancehs;k;jiatstepk+j",see Figure5.20(seeSection2.5.2forschedulecomputation). aretruedependencesbetweenareachingdenitioninstanceanditsuseinstances.thus ThegoodthingwithSA-transformedprogramsisthattheonlyremainingdependences putationsaectingthesamepartofadatastructure(seesection2.5.2).rectangular techniquesimprovedatalocalityandreducecommunicationsingroupingtogethercom- tosingle-assignmentkp,basedoninstancewisereachingdenitioninformation.tiling SinceKPisaperfectlynestedloop,itisalsopossibletoapplytilingtechniques AndonovandRajopadhye[AR94],seealso[BBA98]foradditionalinformationontiling totheoreticalmodels[it88,cfh95,bdrr94]orprolingtechniques.theknapsack problemhasbeenmuchstudiedandveryecientparallelizationshavebeencraftedby mctilesseemappropriateinourcase;theheightmandwidthccanbetunedthanks
j190... CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION j j...figure5.20.instancewisereachingdenitions,schedule,andtilingforkp... k k k theknapsackalgorithm.thethirdgraphinfigure5.20represents22tiles,butlarger sizesareusedinpractice,seesection5.3.10. executionstopsbeingusefulafteragivendelay:if1k;k0mand1j;j0care orthetilingproposedinfigure5.20,wecanprovethatsomevalueproducedduringthe beusedbyhs;k;j+1i;:::;hs;k;min(c;j+k)iorbyhs;k+1;ji.usingtheschedule ConsiderthedependencesinFigure5.20.ThevalueproducedbyinstancehS;k;jimay suchthatk+j+k<k0+j0,thevalueproducedbyhs;k;jiisnotusedbyhs;k0;j0i. Thisallowsacyclicfoldingofthestoragemapping:everyaccessoftheformAS[k,j] canbesafelyreplacedbyas[k%(k+1),j].theresultisshowninfigure5.21.... for(k=1;k<=m;k++) intas[k+2,c+1] inta[c+1],w[m+1],p[m+1] S for(j=w[k];j<=c;j++) AS[k%(K+1),j]=if(k==1) else max(a[j],p[1]+a[j-w[1]]); max(as[(k-1)%(k+1),j],...figure5.21.partialexpansionforkp... P[k]+(fhS;k0;j0i:1k0k^max(0;j K)<j0<j 1g); sion,andwithcintheoriginalprogram(wherenolegalparallelreorderingwaspossible). Thissuggeststwoobservations: StoragerequirementforarrayASis(K+1)C,tobecomparedwithMCintheSAver- rst,thegainisonlysignicantwhenkismuchsmallerthanm,whichmaynot second,theexpandedsubscriptinleft-handsideisnotaneanymore,sincekis asymbolicconstant. bethecaseinpractice; Ingeneral,whenthecyclicfoldingisbasedonasymbolicconstant(likeK),itbecomes bothdiculttomeasuretheeectivenessoftheoptimizationandtoreusethegenerated
5.3.STORAGEMAPPINGOPTIMIZATION codeinsubsequentanalyses.in[lef98],lefebvreproposedtoforbidsuchsymbolicfoldings,butwebelievetheycanstillbeusefulwhensomecompile-timeinformationonthe Eventually,thispartialexpansionisnotschedule-independent,becauseithighlyde- 191 symbolicbounds(likek)isavailable. pendsonthe\parallelfront"directionassociatedwiththeproposedscheduleandtiling. analysishasalreadybeenperformed yieldingrelation andthataparallelexecution 5.3.2 Givenanoriginalprogram(<seq;fe),wesupposethataninstancewisereachingdenition ProblemStatementandFormalSolution problemisheretocomputeanewstoragemappingfexp order<parhasbeencomputedusingsomesuitabletechnique(seechapter2.5.2).our allowingparallelexecutiontopreservetheprogramsemantics.inadditiontotheconict theoriginalsemanticsof(<seq;fe). Givenaparallelexecutionorder<par,wehavetocharacterizecorrectexpansions e suchthat(<par;fexp e)preserves Section2.4.1,webuildaconservativeapproximation6ofthisrelation: relatione,weusetheno-conictrelation6e,whichisthecomplementofe.asin arenotcomplementaryingeneral.indeed,eand6earecomplementaryforthesame Sincebothapproximationsand6areconservative,wehavetobeverycarefulthatthey 8e2E;8v;w2Ae: fe(v)6=fe(w)=)v6w: executione2e,butisdenedasa\mayconict"approximationforallexecutions, and6isthenegationofthe\mustconict"approximation. execution,thesourceofeveryaccessisthesameinthesequentialandintheparallel eofparallelizedprogram(<par;fexp order<par.weintroduce0e:theexactreachingdenitionfunctionforagivenexecution Ourrsttaskistoformalizethememoryreuseconstraintsenforcedbythepartial program: e).12theexpansioniscorrecti,foreveryprogram Wearelookingforacorrectnesscriteriontellingwhethertwowritesmayusethesame memorylocationornot.todothis,wereturntothedenitionof0e: 8e2E;8u2Re;8v2We:v=e(u)=)v=0e(u): (5.14) v<paru^fexp 8e2E:v=0e(u)() Plugging(5.15)in(5.14),weget e(u)=fexp e(v)^ 8w2We:u<parw_w<parv_fexp e(v)6=fexp e(w): 8e2E;8u2Re;8v;w2We:v=e(u)^uparw^wparv=) (5.15) impliedbyv=e(u) through(5.14) anddonotbringanyinformationbetweenfexp Wemaysimplifythisresultsincev<paruandfexp v<paru^fexp e(u)=fexp e(v)constraintsarealready e(v)^fexp e(v)6=fexp andfexp e(w): 8e2E;8u2Re;8v;w2We: e(w): 12Thefactthat<parisnotatotalordermakesnodierenceforreachingdenitions. v=e(u)^uparw^wparv=)fexp e(v)6=fexp e(w):(5.16)
Itmeansthatwecannotreusememory(i.e.wemustexpand)whenbothv=e(u)and 192 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION vparw^uparwaretrue.startingfromthisdynamiccorrectnesscondition,wewould bevalidforallexecutions;inotherterms,itshouldbestrongerthancondition(5.16). liketodeduceacorrectnesscriterionbasedonstaticknowledgeonly.thiscriterionmust vanduorwassignsadierentmemorylocationfromv(v6w)intheoriginalprogram; executesbetweenvanduintheparallelprogram,andeitherwdoesnotexecutebetween nitionvofareaduandanotherwritewtoassigndierentmemorylocationswhen:w Wecannowexposetheexpansioncorrectnesscriterion.Itrequiresthereachingde- Theorem5.2(correctnessofstoragemappings)Ifthefollowingconditionholds, seefigure5.22.hereisthepreciseformulationofthecorrectnesscriterion: 8e2E;8v;w2W: semantics. thentheexpansioniscorrect i.e.allowsparallelexecutiontopreservetheprogram 9u2R:vu^wparv^uparw^(u<seqw_w<seqv_v6w) Proof:Werstrewritethedenitionofvbeingthereachingdenitionofu: =)fexp e(v)6=fexp e(w):(5.17) 8e2E;8u2Re;8v2We: Asaconsequence, v=e(u)=)v<sequ^fe(u)=fe(v)^ 8w2We:u<seqw_w<seqv_fe(v)6=fe(w): 8e2E;8u2Re;8v2We: Theright-handsideof(5.18)canbeinsertedinto(5.16)asanadditionalconstraint: (5.16)isequivalentto v=e(u)=) 8w2We:u<seqw_w<seqv_fe(v)6=fe(w):(5.18) 8e2E;8u2Re;8v;w2We: v=e(u)^wparv^uparw^ u<seqw_w<seqv_fe(v)6=fe(w) Letusnowreplaceewithitsapproximationin(5.19) usingv=e(u))vu: =)fexp e(v)6=fexp e(w):(5.19) vu^ u<seqw_w<seqv_fe(v)6=fe(w)^wparv^uparw 8e2E;8u2Re;8v;w2We: 8e2E;8u2Re;8v;w2We: =)fexp e(v)6=fexp wapproximation:v=e(u))vu e(w) v=e(u)^ u<seqw_w<seqv_fe(v)6=fe(w)^wparv^uparw =)fexp e(v)6=fexp e(w)
5.3.STORAGEMAPPINGOPTIMIZATION Eventually,weapproximatefeoverallexecutionsthankstorelation6 usingfe(v)6= 193 fe(u))v6u: 9u2R:vu^wparv^uparw^(u<seqw_w<seqv_v6w) =)fexp 8v;w2W: 8e2E;8u2Re;8v;w2We: e(v)6=fexp wapproximation:fe(v)6=fe(u))v6u e(w) Thisprovesthat(5.17)isstrongerthan(5.19),itselfequivalentto(5.16). vu^ u<seqw_w<seqv_fe(v)6=fe(w)^wparv^uparw =)fexp e(v)6=fexp e(w) informationonthestoragemappingmaybeavailable,andwedonotwanttoloose Noticewereturnedtothedenitionofeatthebeginningoftheproof.Indeed,some thefollowingcorrectnesscriterion: it13:theright-handsideof(5.18)gathersinformationonwwhichwouldhavebeenlostin approximatingebyin(5.16).withoutthisinformationonw,wewouldhavecomputed 8e2E;8v;w2W: 9u2R:v=(u)^uparw^wparv=)fexp instanceshs;i;wiandhs;i;w0iwouldsatisfytheleft-handsideof(5.20)aslongasw6= Sadly,thischoiceisnotsatisfyinghere.14Indeed,considerthemotivatingexample:two e(v)6=fexp e(w):(5.20) w0.therefore,theyshouldassigndierentmemorylocationsinanycorrectexpanded it\forgets"thatwisnotexecutedafterthereachingdenitione(u).indeed,wparv Section5.3.1thatamorememory-economicalsolutionwasavailable:seeFigure5.17.c. program.thisleadstothesingle-assignmentversionoftheprogram...butweshowedin inleft-handsideof(5.20)ismuchstronger:itstatesthatwisnotexecutedafterany possiblereachingdenitionsofu,whichincludesmanyinstancesexecutionbeforethe Apreciselookto(5.16)explainswhyreplacingewithin5.16)istooconservative: reachingdenitione(u). (5.17): theinterferencerelation./isdenedasthesymmetricclosureoftheleft-handsideof Inthefollowing,weintroduceanewnotationfortheexpansioncorrectnesscriterion: 8v;w2W:v./wdef _ 9u2R:wu^vparw^uparv^(u<seqv_v<seqw_w6v):(5.21) 9u2R:vu^wparv^uparw^(u<seqw_w<seqv_v6w) () Wetakethesymmetricclosurebecausevandwplaysymmetricrolesin(5.17).Using relation. atoollikeomega[pug92],itismucheasiertohandlesetandrelationoperationsthan nestsandexactreachingdenitionrelations. 14ThiscriterionwasenoughforLefebvreandFeautrierin[LF98]sincetheyonlyconsideredaneloop 13Suchinformationmaybemoreprecisethanderivingitfromtheapproximatereachingdenition
194... CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION Sequential v2(u) u w<seqv Parallel v2(u) v6w <seq u u<seqw uparw wparv <par...figure5.22.casesoffexp e(v)6=fexp operations:15 logicformulaswithquantiers.wethusrewritethepreviousdenitionusingalgebraic e(w)in(5.17)..../ = [ ((R)W)\par\(<seq[6)[ par\((par\<seq)):(5.22) ((R)W)\par\(>seq[6)[ par\((par\<seq)) whenv./w onemaysaythat\vinterfereswithw": Rewriting(5.17)withthisnewsyntax,vandwmustassigndistinctmemorylocations Analgorithmtocomputefexp thatwecomputeanexactstoragemappingfexp 8e2E;8v;w2W:v./w=)fexp e fromtheorem5.2ispresentedinsection5.3.4.notice e(v)6=fexp e whichdependsontheexecution. e(w): (5.23) 5.3.3 of./,seefigure5.23. Westartwiththreeexamplesshowingtheusefulnessofeachconstraintinthedenition Wenowpresentthefollowingoptimalityresult:16 OptimalityoftheExpansionCorrectnessCriterion Proposition5.2Let<parbeaparallelexecutionorder.Considertwowritesvandw programsemantics,accordingtoapproximationsand6. denedbytheorem5.2.then,executingprogram(<par;fexp suchthatv./w(denedin(5.22)page194),andastoragemappingfexp e(v)=fexp e(w) thatis,fexp e doesnotsatisfytheexpansioncorrectnesscriterion e)violatestheoriginal e suchthat 15Eachlineof(5.21)isrewrittenindependently,thenpredicatesdependingonuareseparatedfromthe regardingexecutionofwrelativelytouandv,seefigure5.22. issatisedforareadu,andtwowritesvandw.onemaydistinguishthreecases Proof:Supposevu^wparv^uparw^(u<seqw_w<seqv_v6w) ofeachline. others.theexistentialquanticationonuiscapturedbycompositionwith.becausevisthepossible reachingdenitionofsomereadaccess,intersectionwith((r)w)isnecessaryintherstdisjunct 16SeeSection2.4.4forageneralremarkaboutoptimality.
5.3.STORAGEMAPPINGOPTIMIZATION 195... Tx=; Sx=; R=x; SkT<seqRislegalbutrequiresrenaming:thisis enforcedbyt<seqs,i.e.w<seqv(andtpars,i.e. wparv,andrpart,i.e.uparw). Figure5.23.a.Constraintsw<seqvandwparv,uparw Sx=; R=x; Tx=; S<seqT<seqRislegalbutrequiresrenaming:thisis enforcedbyr<seqt,i.e.u<seqw. Figure5.23.b.Constraintswparv,uparwandu<seqw SA[1]=; TA[foo]=; R=A[1]; SkT<seqRislegalbutrequiresrenaming:thisisenforcedbyS6T,i.e.v6w,sinceSmayassignadierent memorylocationast. Figure5.23.c.Constraintswparv,uparwandv6w Figure5.23.Motivatingexamplesforeachconstraintinthedenitionoftheinterference relation... Thersttwocasesare(1)uexecutesbeforewinthesequentialprogram,i.e.u<seqw, or(2)wexecutesbeforevinthesequentialprogram,i.e.w<seqv:thenwmustassign adierentmemorylocationthanv,otherwisethevalueproducedbyvwouldnever reachuasinthesequentialprogram. Whenwexecutesneitherbeforevnorafteruinthesequentialprogram,onemay keepvandwassigningthesamememorylocationifitwasthecaseinthesequential program.however,ifitmightnotbethecase,i.e.ifv6w,thenwmustassigna dierentmemorylocationthanv,otherwisethevalueproducedbyvwouldnever reachuasinthesequentialprogram. 5.3.4 Algorithm Theformalismpresentedintheprevioussectionisgeneralenoughtohandleanyimperativeprogram.However,asacompromisebetweenexpressivityandcomputability,and becauseourpreferedreachingdenitionanalysisisfada[bcf97],wechooseanerelationsasanabstraction.toolslikeomega[pug92]andpip[fea91]canthusbeusedfor symboliccomputations,butourprogrammodelisnowrestrictedtoloopnestsoperating onarrays,withunrestrictedconditionals,loopboundsandarraysubscripts. Findingtheminimalamountofmemorytostorethevaluesproducedbytheprogram isagraphcoloringproblemwhereverticesareinstancesofwritesandedgesrepresent interferencesbetweeninstances:thereisanedgebetweenvandwitheycan'tsharethe samememorylocation,i.e.whenv./w.sinceclassiccoloringalgorithmsonlyapplyto nitegraphs,feautrierandlefebvredesignedanewalgorithm[lf98],whichweextend togeneralloop-nests. Themoregeneralapplicationofourtechniquestartswithinstancewisereachingdenitionanalysis,thenapplyaparallelizationalgorithmusingasdependencegraph thus
avoidingconstraintsduetospuriousmemory-baseddependences,describetheresultasa 196 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION PartialExpansionAlgorithm partialorder<par,andeventuallyapplythefollowingpartialexpansionalgorithm. oftheclassicalsingle-assignementalgorithmsforloopnests,seesection5.1.inputis thesequentialprogram,theresultsand6ofaninstancewiseanalysis,andparallel Storage-Mapping-OptimizationandSMO-Convert-Quastaresimpleextensions executionorder<par(notusedforsimplesaformconversion).thebigdierencewith numberofexpandedarrays,partialrenamingiscalledattheendoftheprocesstocoalesce ofmemoryreferences,seebuild-expansion-vectorinsection5.3.5.toreducethe itspurposeistoreducememoryusageofeachexpandedarrayaswitha\cyclicfolding" SA-formisthecomputationofanexpansionvectorESofintegersorsymbolicconstants: Section5.3.5. Storage-Mapping-Optimization(program;;6;<par) datastructuresusingaclassicalgraphcoloringheuristic,seepartial-renamingin program:anintermediaterepresentationoftheprogram :thereachingdenitionrelation,seenasafunction 1./ returnsanintermediaterepresentationoftheexpandedprogram 6:theno-conictrelation 2 <par:theparallelexecutionorder 3foreacharrayAinprogram [ ((R)W)\par\(<seq[6)[ par\((par\<seq)) ((R)W)\par\(>seq[6)[ par\((par\<seq)) 5674doforeachstatementSassigningAinprogram does declareanarrayas left-handsideofs Build-Expansion-Vector(S;./) 10 98 do=ref foreachreferencereftoainprogram quast \(Iref) Make-Quast(=ref) AS[Iter(CurIns)%ES] 14returnprogram 12 13program 11 ref map Partial-Renaming(program;./) map(curins) SMO-Convert-Quast(quast;ref) parallelexecutionorder<par:weareassuredthattheoriginalprogramsemanticwillbe preservedintheparallelversion. Twotechnicalissueshavebeenpointedout.HowistheexpansionvectorESbuilt Thisalgorithmoutputsanexpandedprogramwhosedatalayoutiswellsuitedfor tion5.3.5. foreachstatements?howispartialrenamingperformed?thisisthepurposeofsec- BuildinganExpansionVector ForeachstatementS,theexpansionvectormustensurethattwoinstancesvandw ArrayReshapingandRenaming assigndierentmemorylocationswhenv./w.moreover,itshouldintroducememory
5.3.STORAGEMAPPINGOPTIMIZATION SMO-Convert-Quast(quast;ref) quast:thequastrepresentationofthereachingdenitionfunction 197 ref:theoriginalreference,usedwhen?isencoutered 31switch 2returnstheimplementationofquastasavalueretrievalcodeforreferenceref casequast=f?g: 654 casequast=f{g: A returnref 78 S x returnas[x%es] Iter({) Stmt({) Array({) 10 12 11 9 casequast=f{1;{2;:::g: casequast=ifpredicatethenquast1elsequast2: return(f{1;{2;:::g) returnifpredicatesmo-convert-quast(quast1;ref) reusebetweeninstancesofsasoftenaspossible. elsesmo-convert-quast(quast2;ref) twoinstancesvandwintheexpandedcodeifv2w,w2wandfexp dencesbetweensomeinstancesofthisstatement(thereisanoutputdependencebetween BuildinganexpandedprogramwithmemoryreuseonSintroducesoutputdepen- relatedby./.suchanoutputdependenceiscalledaneutraloutputdependence[lf98]. Anoutputdependencebetweenvandwisvalidintheexpandedprogramitheleft-hand sideoftheexpansioncorrectnesscriterionisfalseforvandw,i.e.ivandwarenot e(v)=fexp e(w)). shapetoonlyauthorizeneutraloutputdependencesons. TheaimistoelaborateanexpansionvectorwhichgivestoASanoptimizedbutsucient dimensionofasmusthaveasucientsizetoforbidanynon-neutraloutputdependence. considered),withp2f0;:::;ns 1gandgivesthesizeofdimension(p+1)ofAS.Each EachelementES[p+1]istheexpansiondegreeofSatdepthp(thedepthoftheloop ThedimensionofESisequaltothenumberofloopssurroundingS,writtenNS. canbededucedfromtheexpansioncorrectnesscriterion(5.17),callitwsp(v).itholds Foragivenaccessv,thesetofinstanceswhichmaynotwriteinthesamelocationasv allinstanceswsuchthat: Iter(v)[1::p]=Iter(w)[1::p]andIter(v)[p+1]<Iter(w)[p+1]; wisaninstanceofs:stmt(w)=s; LetwSp(v)bethelexicographicmaximumofWSp(v).ForallwinWSp(v),wehavethe followingrelations: Andv./w. Iter(v)[1::p]=Iter(w)[1::p]=Iter(wSp(v))[1::p] IfES[p+1]isequalto(Iter(wSp(v))[p+1] Iter(v)[p+1]+1)andknowingthat theindexfunctionwillbeas[iter(v)%es],weensurethatnonon-neutraloutput Iter(v)[p+1]<Iter(w)[p+1]Iter(wSp(v))[p+1] dependenceappearbetweenvandanyinstanceofwsp(v).butthispropertymustbe
veriedforeachinstanceofs,andesshouldbesettothemaximumof(iter(wsp(v))[p+ 198 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION 1] Iter(v)[p+1]+1)forallinstancesvofS.Thisprovesthatthefollowingdenition ofesforbidsanyoutputdependencebetweeninstancesofsinrelationwith./: ComputingthisforeachdimensionofESensuresthatAShasasucientsizeforthe ES[p+1]=maxIter(wSp(v))[p+1] Iter(v)[p+1]+1:v2W^Stmt(v)=S expansiontopreservethesequentialprogramsemantics.thisisthepurposeofbuild- (5.24) thenumberofinteger-valuedcomponentsines,i.e.thenumberof\projected"dimensions, thebuild-expansion-vectoralgorithm,thesimplestoptimalityconceptisdenedby Foradetailedproof,anintuitiveintroductionandrelatedworks,see[LF98,Lef98].For Expansion-Vector:workingisrelation(v;WSp(v))andmaxvisrelation(v;wSp(v)). asproposedbyquillereandrajopadhyein[qr99].butevenwiththissimpledenition, optimalityisstillanopenproblem.sincethealgorithmproposedby[qr99]hasbeen Build-Expansion-Vector(S;./) isleftforfuturework. provenoptimal,weshouldtrytocombinebothtechniquestoyieldbetterresults,buthis 1NS S:thecurrentstatement./:theinterferencerelation returnsexpansionvectores(avectorofintegersorsymbolicconstants) 42forp=1toNS 3doworking numberofloopssurroundings f(v;w):hs;vi2w^hs;wi2w 756 maxv f(v;max<lexfw:(v;w)2workingg)g ^v[1::p]=w[1::p]^v[1::p+1]<w[1::p+1] 8returnvector vector[p+1] ^hs;vi./hs;wig Now,acomponentofEScomputedbyBuild-Expansion-Vectorcanbeasymbolic max<lexfw v[p+1]+1:(v;w)2maxvg cannotbeshownwiththeavailablecompile-timeinformation,thecomponentissetto +1,meaningthatnomodulocomputationshouldappearinthegeneratedcode(forthis sionofiterationspaceofs,itisusefulforreducingmemoryusage;butifsucharesult constant.whenthisconstantcanbeproven\muchsmaller"thantheassociateddimen- particulardimension).theinterpretationof\muchsmaller"dependsontheapplication: Lefebvreconsideredin[Lef98]thatonlyintegerconstantswhereallowedinES,butwe believethatthisrequirementistoostrong,asshownintheknapsackexample(amodulo K+1isneeded). NoweveryarrayAShasbeenbuilt,onecanperformanadditionalstoragereductionto PartialRenaming thegeneratedcode.indeed,fortwostatementssandt,partialexpansionbuildstwo structuresasandatwhichcanhavedierentshapes.ifattheendoftherenaming processsandtareauthorizedtosharethesamearray,thisonewouldhavetobethe rectangularhullofasandat:ast.itisclearthatthesetwostatementscansharethe samedataithissharingisnotcontradictorywiththeexpansioncorrectnesscriterion
5.3.STORAGEMAPPINGOPTIMIZATION forinstancesofsandt.onemustverifyforeveryinstanceuofsandvoft,thatthe 199 valueproducedbyu(resp.v)cannotbekilledbyv(resp.u)beforeitstopsbeinguseful. betweentwoverticessandtiithasbeenshownthattheycannotsharethesame Inthisgraph,eachvertexrepresentsastatementoftheprogram.Thereisanedge graphsimilartoaninterferencegraphasusedintheclassicregisterallocationprocess. FindingtheminimalrenamingisNP-complete.Ourmethodconsistsinbuildinga datastructureintheirleft-handsides.thenoneappliesonthisgraphagreedycoloring algorithm.finallyitisclearthatverticesthathavethesamecolorcanhavethesame Partial-Renaming(program;./) Greedy-Coloringalgorithmreturnsafunctionmappingeachstatementtoacolor). datastructure.thispartialrenamingalgorithmissketchedinpartial-renaming(the program:theprogramwherepartialrenamingisrequired 2dointerfere 1foreacharrayAinprogram./:theinterferencerelation returnstheprogramwithcoalesceddatastructures 543 doif9hs;vi;ht;wi2w:hs;vi./ht;wi foreachpairofstatementssandtassigningainprogram theninterfere? 687 coloring doleft-handsidea[subscript]ofs foreachstatementssassigningainprogram Greedy-Coloring(interfere) interfere[f(s;t)g 9returnprogram Acoloring(S)[subscript] ThereasonisthatsubscriptsofexpandedarraysareoftheformAS[subscript%ES], Thepartialexpansionalgorithmoftenyieldspoorresults,especiallyontiledprograms. 5.3.6 DealingwithTiledParallelPrograms ure5.24showsanexampleofwhatwewouldliketoachieveonsomeblock-regularexpan- sions.nocyclicfoldingwouldbepossibleonsuchanexample,sincethetwoouterloops areparallel. andtheblockregularityoftiledprogramsdoesnotreallytinthiscyclicpattern.fig- onthetileshapeisavailable.thetechniqueconsistsindividingeachdimensionwiththe cyclicpatternsisstillanopenproblem,becauseitrequiresnon-aneconstraintstobe optimized.weonlyproposeawork-around,whichworkswhensomeaprioriknowledge Thedesignofanimprovedgraphcoloringalgorithmabletoconsiderbothblockand ExpandedarraysubscriptsarethusoftheformAS[i1=shape1,,iN=shapeN],where (i1;:::;in)istheiterationvectorassociatedwithcurins(denedinsection5.1),and associatedtilesize.sometimes,theresultingstoragemappingwillbecompatiblewith whereshapeiiseither1orthesizeoftheithdimensionofthetile. therequiredparallelexecution,andsometimesnot:decisionismadewiththeorem5.2. buttheexpansionschemeissomewhatdierent:seesection5.4.6. Itispossibletoimprovethistechniqueincombiningdivisionsandmodulooperations,
200... intx; CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION SR for(i=0;i<n;i++) for(j=0;j<n;j++){ x=; intxs[n,n]; Figure5.24.a.Originalprogram } =x; SR for(i=0;i<n;i++) for(j=0;j<n;j++){ xs[i,j]=; Figure5.24.b.Single-assignmentprogram } =xs[i,j]; intxs[n/16,n/16]; parallelfor(i=0;i<n;i+=16) S parallelfor(j=0;j<n;j+=16) R for(ii=0;ii<16;ii++) for(jj=0;jj<16;jj++){ xs[i/16,j/16]=; Figure5.24.c.Partiallyexpandedtiledprogram } =xs[i/16,j/16]; 5.3.7...Figure5.24.Anexampleofblock-regularstoragemapping... parallelizationtechnique,suchasschedulingortiling.itiswellsuitedtoparallelizing ThetechniquepresentedinSection5.3.4yieldsthebestresults,butinvolvesanexternal Schedule-IndependentStorageMappings compilers. form,atamuchlowercostinmemoryusage. noparallelexecutionschemeisenforced.theaimistopreservethe\portability"ofsa Aschedule-independent(a.k.a.universal)storagemapping[SCFS98]isusefulwhenever and./2,wehave: parallelexecutionorders<1parand<2parwhoseassociatedinterferencerelationsare./1 Fromthedenitionof./ theinterferencerelation in(5.21),andconsideringtwo Now,aschedule-independentstoragemappingfexp <1par<2par=)./2./1: order.bydenitionofcorrectexecutionorders Theorem2.2page81 thisconditionis Mapping-Optimizationalgorithmshouldthusbeincludedinanycorrectexecution bleparallelexecution<paroftheprogram.partialorder<parusedinthestorage- e mustbecompatiblewithanypossi- denitionrelation:+. satisedbythedata-owexecutionorder,whichisthetransitiveclosureofthereaching canbehopedforthedata-owexecutionorder+,becausepresburgerarithmeticisnot closedundertransitiveclosure.hence,weneedtocomputeanapproximaterelation. andexperimentalstudyarealsopresentedinsection5.2.5).ingeneral,noexactresult Section3.1.2describesawaytocomputethetransitiveclosureof(usefulremarks Becausetheapproximationmustbeincludedinallpossiblecorrectexecutionorder,we wantittobeasub-orderoftheexactdata-oworder(i.e.theoppositeofaconservative approximation).suchanapproximationcanbecomputedwithomega[pug92].
5.3.STORAGEMAPPINGOPTIMIZATION 5.3.8 DynamicRestorationoftheData-Flow 201 wehaveseeninsection5.1.3.indeed,algorithmloop-nests-implement-phiapplies Implementingfunctionsforapartiallyexpandedprogramisnotverydierentfromwhat arrays.now,remember-arraysaresupposedtobeinone-to-onemappingwithexpanded withoutmodication.butdoingthis,nostoragemappingoptimizationisperformedon- oftheoriginalprogram,sincethesamedependenceswillbesharedbyexpandedarrays and-arrays. datastructures.single-assignment-arraysarenotnecessarytopreservethesemantics ofalgorithmpartial-renaming. counterpartas[x%es].inasecondstep,onemerge-arraystogetherusingtheresult Phi.TherststepconsistsinreplacingeveryreferencetoAS[x]withits\folded" TheresultingcodegenerationalgorithmisverysimilartoLoop-Nests-Implementreconsidered:valuesproducedbyafewinstancesmaynowbeoverwritten,accordingtothe newstoragemapping.asinthemotivatingexample,thefunctioncanevendisappear, seefigure5.17.agoodtechniquetoautomaticallyachievethisisnottoperformanew Eventually,foragivenfunction,thesetofpossiblereachingdenitionsshouldbe a(set)referenceshouldbereplacedby reachingdenitionanalysis.oneshouldupdatetheavailablesetsofreachingdenitions: Renaming,fexp Moreover,ifcoloringistheresultofthegreedygraphcoloringalgorithminPartial- e(hs;xi)=fexp fv2set:@w2set:v<seqw^fexp e(hs0;x0i)isequivalentto e(v)=fexp e(w)g: 5.3.9 BacktotheExamples coloring(s)=coloring(s0)^(xmodes=x0modes0): FirstExample UsingtheOmegaCalculatortext-basedinterface,wedescribeastep-by-stepexecution arewritten[i,0,1],[i,j,2]and[i,0,3],respectively. withzeroes.wenumbert,s,rwith1,2,3inthisorder,soht;ii,hs;i;jiandhr;ii oftheexpansionalgorithm.wehavetocodeinstancesasinteger-valuedvectors.an instancehs;iiisdenotedbyvector[i,,s],where[]possiblypadsthevector rithmaccordingtotheparallelexecutionorderproposedinfigure5.17. Schedule-dependentstoragemapping.Werstapplythepartialexpansionalgo- S:={[i,0,2]->[i,0,1]:1<=i<=N} TheresultofinstancewisereachingdenitionanalysisiswritteninOmega'ssyntax: union{[i,w,2]->[i,w-1,2]:1<=i<=n&&1<=w} Theno-conictrelationistrivialhere,sincetheonlydatastructureisascalarvariable: union{[i,0,3]->[i,w,2]:1<=i<=n&&0<=w}; union{[i,0,3]->[i,0,1]:1<=i<=n} NCon:={[i,w,s]->[i',w',s']:1=2};#1=2meansFALSE!
202Weconsiderthattheouterloopisparallel.Itgivesthefollowingexecutionorder: CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION Par:={[i,w,2]->[i,w',2]:1<=i<=N&&0<=w<w'}union {[i,0,1]->[i,w',2]:1<=i<=n&&0<=w'}union {[i,0,1]->[i,0,3]:1<=i<=n}union callitint. Wehavetocomputerelation./inleft-handsideoftheexpansioncorrectnesscriterion, {[i,w,2]->[i,0,3]:1<=i<=n&&0<=w}; Full:={[i,w,s]->[i',w',s']:1<=s<=3&&(s=2 w=w'=0) #The"full"relation #Thesequentialexecutionorder Lex:={[i,w,2]->[i',w',2]:1<=i<=i'<=N&&0<=w,w'&&(i<i' w<w')} &&1<=i,i'<=N&&0<=w,w'}; union{[i,w,2]->[i',0,1]:1<=i,i'<=n&&0<=w&&i<i'} union{[i,0,3]->[i',0,3]:1<=i<i'<=n} union{[i,0,1]->[i',w',2]:1<=i<=i'<=n&&0<=w'} union{[i,0,1]->[i',0,1]:1<=i<i'<=n} union{[i,0,3]->[i',0,1]:1<=i<i'<=n} union{[i,w,2]->[i',0,3]:1<=i<=i'<=n&&0<=w} union{[i,0,1]->[i',0,3]:1<=i<=i'<=n} NPar:=Full-Par; ILex:=inverseLex; union{[i,0,3]->[i',w',2]:1<=i<i'<=n&&0<=w'}; INpar:=inverseNPar; Int:=Intunion(inverseInt); Int:=(INParintersection(ILexunionNCon)) Theresultis: union(inparintersections(nparintersectionlex)); Int; {[i,w,2]->[i',w',2]:1<=i'<i<=n&&1<=w<=w'}union {[i,0,2]->[i',w',2]:1<=i'<i<=n&&0<=w'}union {[i,w,2]->[i',w-1,2]:1<=i'<i<=n&&1<=w}union {[i,w,2]->[i',w',2]:1<=i'<i<=n&&0<=w'<=w-2}union {[i,0,1]->[i',0,1]:1<=i'<i<=n}union {[i,0,2]->[i',0,1]:1<=i'<i<=n}union {[i,0,1]->[i',w',2]:1<=i'<i<=n&&0<=w'}union {[i,0,3]->[i',0,1]:1<=i'<i<=n}union {[i,0,3]->[i',w',2]:1<=i'<i<=n&&0<=w'}union {[i,w,2]->[i',0,3]:1<=i<i'<=n&&0<=w}union {[i,0,1]->[i',0,3]:1<=i<i'<=n}union {[i,w,2]->[i',0,1]:1<=i<i'<=n&&0<=w}union {[i,0,1]->[i',0,2]:1<=i<i'<=n}union
5.3.STORAGEMAPPINGOPTIMIZATION {[i,0,1]->[i',0,1]:1<=i<i'<=n}union 203 {[i,w,2]->[i',w',2]:1<=i<i'<=n&&0<=w<=w'-2}union {[i,w,2]->[i',0,2]:1<=i<i'<=n&&0<=w}union {[i,w,2]->[i',w+1,2]:1<=i<i'<=n&&0<=w}union {[i,w,2]->[i',w',2]:1<=i<i'<=n&&1<=w'<=w} isempty,meaningthatneitherexpansionnorrenamingmustbedoneinsideaniteration Intintersection{[i,w,s]->[i,w',s']} Aquickvericationshowsthat program. ThenES[1]shouldbesettoN.Wehaveautomaticallyfoundthepartiallyexpanded WS0(v)(i.e.fortheouterloop)yieldsallaccesseswexecutingafterv(forthesamei). oftheouterloop.inparticular:es[2]shouldbesetto0.however,computingtheset accordingtothe"data-ow"executionorder.theparallelexecutionorderisdenedas follows: Schedule-independentstoragemapping.Wenowapplytheexpansionalgorithm Onceagain Intintersection{[i,w,s]->[i,w',s']} Par:=S+; parallelization-dependent,one. isempty.theschedule-independentstoragemappingisthusthesameastheprevious, Figure5.17. SecondExample Theresultingprogramforbothtechniquesisthesameasthehand-craftedonein WenowconsidertheknapsackprograminFigure5.18.Itiseasytoshowthatascheduleindependentstoragemappingwouldgivenobetterresultthatsingle-assignmentform. mappingwithsubscriptsoftheformas[curins%es] wouldbemoreeconomicalthan Moreprecisely,itisimpossibletondanyschedulesuchthata\cyclicfolding" astorage single-assignmentform. classicaltechniquessincetheloopisperfectlynested.section5.3.10hasshowngood performancefor1632tiles,butweconsider21tilesforthesakeofsimplicity.the lelizationofprogramkprequirestilingoftheiterationspace.thiscanbedoneusing Wearethuslookingforaschedule-dependentstoragemapping.Anecientparal- parallelexecutionorderconsideredisthesameastheonepresentedinsection5.3.1:tiles arescheduledinfrontsofconstantk+j,andtheinner-tileorderistheoriginalsequential S:={[k,j]->[k-1,j]:2<=k<=M&&1<=j<=C}union executionone. TheresultofinstancewisereachingdenitionanalysisiswritteninOmega'ssyntax: relation: Instanceswhichmaynotassignthesamememorylocationaredenedbythefollowing {[k,j]->[k,j']:1<=k<=m&&1<=j'<j<=c&&j'-k<=j}; NCon:={[k,j]->[k',j']:1<=k,k'<=M&&1<=j,j'<=C&&j!=j'};
204Consideringthe21tiling,itiseasytocompute<par: CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION InnerTile:={[k,j]->[k',j]:(existskq,kr,kr':k=2kq+kr InterTile:={[k,j]->[k',j']:(existskq,kr,kq',kr':k=2kq+kr Par:=Lexintersection(InnerTileunionInterTile); &&k'=2kq+kr'&&0<=kr<kr'<2)}; &&k'=2kq'+kr'&&0<=kr,kr'<2&&kq+j<kq'+j')}; #The"full"relation callitint. Wehavetocomputerelation./inleft-handsideoftheexpansioncorrectnesscriterion, Full:={[k,j]->[k',j']:1<=k,k'<=M&&1<=j,j'<=C}; #Thesequentialexecutionorder NPar:=Full-Par; ILex:=inverseLex; Lex:=Fullintersection{[k,j]->[k',j']:k<k' (k=k'&&j<j')}; INpar:=inverseNPar; Int:=Intunion(inverseInt); Int:=(INParintersection(ILexunionNCon)) Theresultis: union(inparintersections(nparintersectionlex)); Int; {[k,j]->[k',j']:1<=k<=k'<=m&&1<=j<j'<=c}union {[k,j]->[k',j']:1<=k<k'<=m&&1<=j'<j<=c}union {[k,j]->[k',j']:exists(alpha:1,2alpha+2<=k<k'<m {[k,j]->[k',j']:exists(alpha:1,2alpha+2<=k'<k<m {[k,j]->[k',j']:1<=j<j'<=c&&1<=k'<k<=m}union &&j<=c&&1<=j'&&k'+2j'<=2+2j+2alpha)}union {[k,j]->[k',j']:1<=k'<=k<=m&&1<=j'<j<=c} &&j'<=c&&1<=j&&k+2j<=2+2j'+2alpha)}union Intintersection{[k,j]->[k+K+1,j']} Aquickvericationshowsthat 5.3.10 isempty,meaningthates[1]shouldbesettok+1. automaticstoragemappingoptimizationhasalsobeenperformedongeneralloop-nests, PartialexpansionhasbeenimplementedforCray-Fortrananeloopnests[LF98].Semi- Experiments usingfada,omega,andpip. ThethreeaneloopnestsexampleshavealreadybeenstudiedbyLefebvrein[LF98, Figure5.25summarizesexpansionandparallelizationresultsforseveralprograms.
5.4.CONSTRAINEDSTORAGEMAPPINGOPTIMIZATION... Sequential Parallel ParallelSize Run-timeOverhead205 MVProduct Cholesky ProgramComplexity O(N2) N2+2N+1 Size Complexity 2N2+3N SA Optimized 2N2+N SA Optimized Convolution Knapsack Gaussian O(NM) O(MC) O(N3) N2+N+1 C+2M 1 O(M+C)MC+C+2MKC+2C+2M O(M) O(N) N3+N2+N NM+N 2N2+2N N cheap free no free...figure5.25.timeandspaceoptimization... no Lef98]:matrix-vectorproduct,CholeskyfactorizationandGaussianelimination.Afew experimentshavebeenmadeonansgiorigin2000,usingthemplibrary(butnotpca, thebuilt-inautomaticparallelizer).asonewouldexpect,resultsfortheconvolution resultsformediumarraysizes,17bothintermsofspeed-upandrelativelytotheoriginal theoriginal(notexpanded)one;thesecondoneshowsthespeed-up.wegotverygood infigure5.26.therstgraphcomparesexecutiontimeoftheparallelprogramandof programareexcellentevenforsmallvaluesofn.executiontimesforprogramkpappear knapsackprogram.... 140 120 100 80 8...Figure5.26.Performanceresults... 60 4 40 2 20 5.4 0 ConstrainedStorageMappingOptimization 1 1 2 4 8 16 32 1 2 4 8 16 32 Processors Processors expansion.weshowherethatcombiningthetwotechniquesinamoregeneralexpansion Sections5.2and5.3addressedtwotechniquestooptimizeparallelizationviamemory twocomplementarydirections: frameworkispossibleandbringssignicantimprovements.optimizationisachievedfrom Addingconstraintstolimitmemoryexpansion,likestaticexpansionavoidingfunctions[BCC98],privatization[TP93,MAL93],orarraystaticsingleassignment 17HereC=2048,M=1024andK=16,with1632tiles(scheduledsimilarlytoFigure5.18). [KS98].Allthesetechniquesallowpartialremovalofmemory-baseddependences, butmayextractlessparallelismthanconversiontosingleassignmentform. Time (ms) Sequential Parallel Speed-up 32 16 Optimal Effective
206Applyingstoragemappingoptimizationtechniques[CL99].Someoftheseareeither CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION (scheduling,tiling,etc.)ornot. mizations)whethertheyrequireformercomputationofaparallelexecutionorder schedule-independent[scfs98]orschedule-dependent[lf98](yieldingbetteropti- intoauniedframeworkformemoryexpansion.themotivationforsuchaframeworkis thefollowing:becauseoftheincreasedcomplexityofdealingwithirregularcodes,and giventhewiderangeofparameterswhichcanbetunedwhenparallelizingsuchprograms, Wetryheretogetthebestofbothdirectionsandshowthebenetofcombiningthem ofthistrend.webelievethatourconstrainedexpansionframeworkgreatlyreducesthe complexityoftheoptimizationproblem,inreducingthenumberofparametersandhelping orafewoftheseparameters.thetwoprecedingsectionsaresomeofthebestexamples abroadrangeofexpansiontechniqueshavebeenorwillbedesignedforoptimizingone theautomationprocess. weformallydenecorrectconstrainedstoragemappings.then,wepresentanintralelizationtechniquesproceduralalgorithmwhichhandlesanyimperativeprogramandmostloopnestparal- Withthehelpofamotivatingexampleweintroducethegeneralconcepts,before Westudythepseudo-codeinFigure5.27.a.Suchnestedloopswithconditionalsappear 5.4.1 inmanykernels,butmostparallelizationtechniquesfailtogenerateecientcodefor Motivation theseprograms.instancesoftaredenotedbyht;i;ji,instancesofsbyhs;i;j;ki,and instancesofrbyhr;ii,for1i;jmand1kn.(\p(i;j)"isabooleanfunction ofiandj.)... doublex; for(i=1;i<=m;i++){ for(j=1;j<=m;j++) doublext[m+1,m+1],xs[m+1,m+1,n+1]; TS if(p(i;j)){ x=0; R =x; } for(k=1;k<=n;k++) } x=x; Tfor(i=1;i<=M;i++){ for(j=1;j<=m;j++) S if(p(i;j)){ xt[i,j]=0; } for(k=1;k<=n;k++) xs[i,j,k]=if(k==1)xt[i,j]; elsexs[i,j,k-1]; Figure5.27.a.Originalprogram Figure5.27.b.Singleassignmentform R} =(fhs;i;1;ni;:::;hs;i;m;nig); onetimeforeachiterationoftheouterloop.apreciseinstancewisereachingdenition...figure5.27.motivatingexample... whenk=1andhs;i;j;k 1iwhenk>1.Weonlygetanapproximateresultfor analysistellsusthatthereachingdenitionofthereadaccesshs;i;j;kitoxisht;i;ji Onthisexample,assumeNispositiveandpredicate\P(i;j)"evaluatestotrueatleast denitionsthatmayreachhr;ii:thosearefhs;i;1;ni;:::;hs;i;m;nig.infact,the
5.4.CONSTRAINEDSTORAGEMAPPINGOPTIMIZATION 207 valueofxmayonlycomefroms(sincen>0)forthesamei(sincetexecutesatleast onetimeforeachiterationoftheouterloop),andfork=n. Obviously,memory-baseddependencesonxhampersparallelization.Ourintentisto expandscalarxsoastogetridofasmanydependencesaspossible.figure5.27.bshows ourprogramconvertedtosaform.theuniquefunctionimplementsarun-timechoice betweenvaluesproducedbyhs;i;1;ni;:::;hs;i;m;ni. SAremovedenoughdependencestomakethetwoouterloopsparallel,seeFigure5.28.a.Functioniscomputedatrun-timeusingarray@x.Itholdsthelastvalueof jatstatementswhenxwasassigned.thisinformationallowsvaluerecoveryinr,see thethirdmethodinsection5.1.4fordetails. Butthisparallelprogramisnotusableonanyarchitecture.Themainreasonis memoryusage:variablexhasbeenreplacedbyahugethree-dimensionalarray,plustwo smallerarrays.thiscodeisapproximatelyvetimesslowerthantheoriginalprogramon asingleprocessor(whenarrayscanbeaccomodatedinmemory).... doublext[m+1,m+1],xs[m+1,m+1,n+1]; int@x[m+1]; parallelfor(i=1;i<=m;i++){ @x[i]=?; parallelfor(j=1;j<=m;j++) if(p(i;j)){ T xt[i,j]=0; for(k=1;k<=n;k++) S xs[i,j,k]=if(k==1) xt[i,j]; elsexs[i,j,k-1]; @x[i]=max(@x[i],j); } R=xS[i,@x[i],N]; } Figure5.28.a.ParallelSA doublex[m+1,m+1]; int@x[m+1]; parallelfor(i=1;i<=m;i++){ @x[i]=?; parallelfor(j=1;j<=m;j++) if(p(i;j)){ T x[i,j]=0; for(k=1;k<=n;k++) S x[i,j]=x[i,j]; @x[i]=max(@x[i],j); } R=x[i,@x[i]]; } Figure5.28.b.ParallelSMO...Figure5.28.Parallelizationofthemotivatingexample... Thisshowstheneedforamemoryusageoptimizationtechnique.Storagemapping optimization(smo)[cl99,lf98,scfs98]consistsinreducingmemoryusageasmuch aspossibleassoonasaparallelexecutionorderhasbeencrafted,seesection5.3.a singletwo-dimensionalarraycanbeused,whilekeepingthetwoouterloopsparallel,see Figure5.28.b.Run-timecomputationoffunctionwitharray@xseemsverycheapat rstglance,butexecutionof@x[i]=max(@x[i],j)hidessynchronizationsbehind thecomputationofthemaximum!asusual,itresultsinaverybadscaling:good accelerationsareobtainedforaverysmallnumberofprocessors,thenspeed-updrops dramaticallybecauseofsynchronizations.figure5.29givesexecutiontimeandspeed-up fortheparallelprogram,comparedtotheoriginal notexpanded one.weusedthemp libraryonansgiorigin2000,withm=64andn=2048,andsimpleexpressionsfor \"parts. Thisbadresultshowstheneedforanerparallelizationscheme.Thequestionisto
208... CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION Time (ms) 140 120 100 Sequential SMO 80 1 60...Figure5.29.Performanceresultsforstoragemappingoptimization... 40 0.5 20 0 0.25 widely-usedparallelcomputers,theprocessornumberislikelytobelessthan100,but ndagoodtradeobetweenexpansionoverheadandparallelismextraction.ifwetarget 1 2 4 8 16 32 1 2 4 8 16 32 Processors Processors SAformextractedtwoparallelloopsinvolvingM2processors!Theintuitionisthatwe wastedmemoryandrun-timeoverhead. andinnermostloops,butexpansionalongjisforbidden itrequiresafunctionthus benetthatnofunctionisnecessaryanymore:xcanbesafelyexpandedalongoutermost (MSE)[BCC98],orprivatization[TP93,MAL93].Choosingstaticexpansionhasthe Onewouldpreferapragmaticexpansionscheme,suchasmaximalstaticexpansion theoriginalprogram hasbeenreplacedbyatwo-dimensionalarray. runstwotimesslowerthantheoriginalone:scalarx probablypromotedtoaregisterin violatesthestaticconstraint,seesection5.2.now,onlytheouterloopisparallel,andwe getmuchbetterscaling,seefigure5.30.however,onasingleprocessortheprogramstill... doublex[m+1,n+1]; Tparallelfor(i=1;i<=M;i++){ for(j=1;j<=m;j++) S if(p(i;j)){ for(k=1;k<=n;k++) x[i,0]=0; 32 R} =x[i,n]; } x[i,k]=x[i,k-1]; Optimal MSE 16 8 4 2 1 interestregardingparallelismextraction.combiningitwithstoragemappingoptimization...figure5.30.maximalstaticexpansion... 0.5 solvestheproblem,seefigure5.31.scalingisexcellentandparallelizationoverheadis Maximalstaticexpansionexpandedxalongtheinnermostloop,butitwasofno 1 2 4 8 16 32 Processors verylow:theparallelprogramruns31:5timesfasterthantheoriginaloneon32processors (form=64andn=2048). Speed-up (parallel / original) Speed-up (parallel / original) 4 2 Optimal SMO
5.4.CONSTRAINEDSTORAGEMAPPINGOPTIMIZATION Thisexampleshowstheuseofcombiningconstrainedexpansions suchasprivatizationandstaticexpansion withstoragemappingoptimizationtechniques,toimprove 209 tiveprograms.althoughthisalgorithmcannotitselfchoosethe\best"parallelization,it Inthefollowing,wepresentanalgorithmusefulforautomaticparallelizationofimpera- parallelizationofgeneralloopnests(withunrestrictedconditionalsandarraysubscripts). aimstosimultaneousoptimizationofexpansionandparallelizationconstraints.... parallelfor(i=1;i<=m;i++){ doublex[m+1]; for(j=1;j<=m;j++) ST if(p(i;j)){ for(k=1;k<=n;k++) x[i]=0; R =x[i]; } x[i]=x[i];... Figure5.31.Maximalstaticexpansioncombinedwithstoragemappingoptimization Becauseourframeworkisbasedonmaximalstaticexpansionandstoragemappingoptimization,weinherittheirprogrammodelandmathematicalabstraction:weonlyconsider ProblemStatement 5.4.2 IntroducingConstrainedExpansion nestsofloopsoperatingonarraysandabstracttheseprogramswithanerelations. Themotivatingexampleshowsthebenetsofputtinganapriorilimittoexpansion. Staticexpansion[BCC98]isagoodexampleofconstrainedexpansion.Whataboutother previouscompilationstages possiblywithuserinteraction.itiscalledtheconstraint staticexpansion,wesupposethatsomeequivalencerelationonwritesisavailablefrom thatdoesnotexpandvariableswhentheincurredoverheadis\toohigh".togeneralize expansionschemes?thegoalofconstrainedexpansionistodesignpragmatictechniques relation.astoragemappingconstrainedbyisanymappingfexp 8e2E;8v;w2W: vw^fe(v)=fe(w)=)fexp e(v)=fexp e suchthat Itisdiculttodecidewhethertoforbidexpansionofsomevariableornot.Ashort surveyofthisproblemispresentedinsection5.4.5,alongwithadiscussionaboutbuilding e(w): (5.25) ingexpansionandparallelismforperformance.wedonotpresenthereasolutiontothis forsection5.4.8alldiscussionsaboutpickingtherightparallelexecutionorder. constraintrelationfroma\syntactical"or\semantical"constraint.moreover,weleave gratedtoolforparallelization,assoonasthe\strategy"hasbeenchosen whatexpansion complexproblem.thealgorithmdescribedinthenextsectionsshouldbeseenasaninte- Now,thetwoproblemsarepartofthesametwo-criteriaoptimizationproblem:tun- 8 } 4 2 1 0.5 1 2 4 8 16 32 Processors Speed-up (parallel / original) 32 16 Optimal MSE + SMO
constraints,whatkindofschedule,tiling,etc.mostofthesestrategieshavealreadybeen 210 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION inanautomaticoptimizationprocess.thesummaryofouroptimizationframeworkis shownusefulandpracticalforsomeprograms;ourmaincontributionistheirintegration presentedinfigure5.32.... Single-assignmentform Expansionconstrainedby Data-owexecutionorder (storagemappingoptimization) (scheduling,tiling,etc.) Correctoptimizedexpansionf0e=(fe;) Correctparallelexecutionorder<par Expansion Originalstoragemappingfe Parallelism Sequentialprogram<seq...Figure5.32.Whatwewanttoachieve... Werstdenecorrectparallelizationsthenstateouroptimizationproblem. 5.4.3 FormalSolution WhatisaCorrectParallelExecutionOrder? Memoryexpansionpartiallyremovesdependencesduetomemoryreuse.Recallfrom Section2.5thatrelationexpapproximatesthedependencerelationof(<seq;fexp expandedprogramwithsequentialexecutionorder.(expequalswhentheprogramis order<partosatisfythefollowingcondition: convertedtosaform.)thankstotheorem2.2page81,wewantanyparallelexecution e),the sentedinsection5.4.8. Computationofapproximatedependencerelationexpfromstoragemappingfexp 8({1;r1);({2;r2)2A: ({1;r1)exp({2;r2)=){1<par{2: e ispre- (5.26) tiontopreserveoriginalsemantics.ourtaskistoformalizememoryreuseconstraints WhatisaCorrectExpansion? enforcedby<par.usinginterferencerelation./denedinsection5.3.2,wehaveproven intheorem5.2thattheexpansioniscorrectifthefollowingconditionholds. Givenparallelorder<par,wearelookingforcorrectexpansionsallowingparallelexecu- 8e2E;8v;w2W:v./w=)fexp e(v)6=fexp e(w): (5.27)
ComputingParallelExecutionOrdersandExpansions 5.4.CONSTRAINEDSTORAGEMAPPINGOPTIMIZATION 211 correctnesscriteria(5.26)and(5.27).letusshowhowsolvingtheseequationssimultaneouslyyieldsasuitableparallelprogram(<par;fextion5.1insection5.2.3 thataconstrainedexpansionismaximal i.e.assignsthelargest FollowingthelinesofSection5.2.3,weareinterestedinremovingasmanydependences e). Weformalizedtheparallelizationcorrectnesswithanexpansionconstraint(5.25)andtwo aspossible,withoutviolatingtheexpansionconstraint.wecanprove likeproposi- numberofmemorylocationswhileverifying(5.25) i alenceclassesof.indeed,iffe(v)=fe(w),conditionfexp StillfollowingSection5.2.3,weassumethatfexp 8e2E;8v;w2We:vw^fe(v)=fe(w)()fexp e =(fe;),whereisconstantonequiv- e(v)=fexp e(v)=fexp e(w)becomes e(w): equivalentto(v)=(w).becauseweneedtoapproximateoverallpossibleexecutions, weuseconictrelation,andourmaximalconstrainedexpansioncriterionbecomes of(instancesthat\may"hitthesamememorylocation),(v)canbedenedviaa Computingisdonebyenumeratingequivalenceclassesof.Foranyaccessvinaclass 8v;w2W;vw:vw()(v)=(w) (5.28) minimumisasimplewaytondrepresentatives,seesection5.2.5. representativeoftheequivalenceclassofvforrelation.computingthelexicographical write.thefullcomputationisdoneinsection5.4.8anduses(5.28);theresultis onviftheyhitthesamememorylocation,vexecutesbeforew,andatleastoneisa Itistimetocomputedependencesexpofprogram(<seq;fexp e):anaccesswdepends 8v2W;w2R:vexpw, 9u2W:uw^vu^vu^v<seqw Werelyonclassicalalgorithmstocompute<parfromexp[Fea92,DV97,IT88,CFH95]. 8v2R;w2W:vexpw, 9u2W:uv^uw^uw^v<seqw Knowing(<par;fexp 8v;w2W:vexpw,vw^vw^v<seqw e),wecouldstopandsaywehavesuccessfullyparallelizedour (5.29) program;butnothingensuresthatfexp themotivatingexample).wemustbuildanewexpansionfrom<parthatminimizes memoryusagewhilesatisfying(5.27). Forconstrainedexpansionpurposes,fexp e isan\economical"storagemapping(remember hassomeconsequencesontheexpansioncorrectnesscriterion:whenfe(v)6=fe(w),itis notnecessarytoset(v)6=(w)toenforcefexp e hasbeenchosenoftheform(fe;).this expansioncorrectnesscriterionthankstoasimplieddenitionofinterferencerelation./. Letbetheinterferencerelationforconstrainedexpansion: v6wclausein(5.22)isnotnecessaryanymore(seepage194),andwemayrewritethe e(v)6=fexp e(w).asaconsequence,the vwdef () 9u2R:vu^wparv^uparw^(u<seqw_w<seqv) Wecanrewritethisdenitionusingalgebraicoperations: _ 9u2R:wu^vparw^uparv^(u<seqv_v<seqw):(5.30) = ((R)W)\par\>seq[ par\((par\<seq)) [ ((R)W)\par\<seq[ par\((par\<seq)):(5.31)
212 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION Theorem5.3(correctnessofconstrainedstoragemappings)Ifastoragemappingfexp eisoftheform(fe;)andthefollowingconditionholds,thenfexp eisacorrect expansionoffe i.e.fexp eallowsparallelexecutiontopreservetheprogramsemantics. 8v;w2W;vw:vw=)(v)6=(w): (5.32) ProvingTheorem5.3isastraightforwardrewritingoftheproofofTheorem5.2and theoptimalityresultofproposition5.2alsoholds:theonlydierenceisthatthev6w clausehasbeenreplacedbyvwinleft-handsideof(5.32). Buildingafunctionsatisfying(5.32)isalmostwhatthepartialexpansionalgorithm presentedinsection5.3.5hasbeencraftedfor.insteadofgeneratingcode,onecan redesignthisalgorithmtocomputeanequivalencerelationoverwrites:thecoloring relation.itsonlyrequirementistoassigndierentcolorstointerferingwrites, 8v;w2W:vw=):(vw); (5.33) butwearealsointerestedinminimizingthenumberofcolors.whenvw,itsaysthat itiscorrecttohavefexp e(v)=fexp e(w).thenewgraphcoloringalgorithmispresentedin Section5.4.6. Byconstructionofrelation,afunctiondenedby 8v;w2W;vw: vw()(v)=(w) satisesexpansioncorrectness(5.32),butannoyingly,nothingensuresthatexpansion constraint(5.25)isstillsatised:forallv;w2wsuchasvw,wehavevw)(v)6= (w)butnotnecessarilyvw)(v)6=(w).indeed,denesaminimalexpansion allowingtheparallelexecutionordertopreservetheoriginalsemantics,butitdoesnot enforcethatthisexpansionsatisestheconstraint. Therstproblemistocheckthecompatibilityofand.Thisisensuredbythe followingresult.18 Proposition5.3Forallwritesvandw,itisnotpossiblethatvwandvwatthe sametime.19 Proof:Supposevw,vw,vwandv<seqw.Thethirdlineof(5.29)showsthat vexpw,hencev<parwfrom(5.26).thisprovesthatthevparwconjunctinsecond lineof(5.30)doesnothold.now,sincevw,onemayconsiderareadinstanceu2r suchthattherstlineof(5.30)issatised:vu^wparv^uparw^u<seqw. Exchangingtheroleofuandvinthesecondlineof(5.29)showsthatuexpw,hence u<parwfrom(5.26);thisiscontradictorywithuparw. Likewise,thecasew<seqvyieldsacontradictionwithuparvinthesecondlineof (5.30).Thisterminatestheproof. Wenowhavetodenefromanewequivalencerelation,consideringbothand. Figure5.33showsthat[isnotsucient:considerthreewritesu,vandwsuchthat fe(u)=fe(v)=fe(w),uvandvw.(5.28)enforcesfexp e(u)=fexp e(v)sinceuv. Moreover,tosparememory,weshouldusecoloringrelationandsetfexp e(v)=fexp e(w). Then,noexpansionisdoneandparallelorder<parmaybeviolated. 18Theproofofthisstrongresultisrathertechnicalbuthelpsunderstandingtheroleofeachconjunct inequations(5.29),(5.26)and(5.30). 19Anon-optimaldenitionofrelationwouldnotyieldsuchacompatibilityresult.
5.4.CONSTRAINEDSTORAGEMAPPINGOPTIMIZATION... 213 wrw=x if()x= u ruv=x if()x= wrw=x v if()x= u ruv=x wrw=x v y= if()x= (ruv)=fu;vg. Originalprogram, (rw)=fwgand mayreadthevalue producedbyu. Wrongexpansionwhen ruv=y if()y= movingutothetop:rw Correctwhen assigningyinuandv...figure5.33.strangeinterplayofconstraintandcoloringrelations... andmovingutothe top. coloringrelation,andisdenedby Wethusbuildanewrelationoverwrites,builtfromand.Itiscalledtheconstraint e(u)=fexp Toavoidthispitfall,coloringrelationmustbeusedwithcare:onemaysafelyset e(v)whenforallu0u,v0v:u0v0(i.e.u0andv0sharethesamecolor). Wecanrewritethisdenitionusingalgebraicoperations: 8v;w2W:vwdef ()vw_ 8v0;w0:v0v^w0w=)v0w0: (5.34) andareequivalencerelations.moreover,choosing(v)=(w)whenvwand Thegoodthingisthatrelationisanequivalence:theproofissimplesinceboth =[ n((ww)n): (5.35) constraintandtheexpansioncorrectnesscriterion. (v)6=(w)whenitsnotthecaseensuresthatfexp Thefollowingresultsolvestheconstraintstoragemappingoptimizationproblem:20 e =(fe;)satisesboththeexpansion Theorem5.4Storagemappingfexp 8v;w2W;vw:vw()(v)=(w) e oftheform(fe;)suchthat constrainedbyandallowstheparallelexecutionorder<partopreservetheprogram istheminimalstoragemapping i.e.accessesthefewermemorylocations whichis (5.36) assignthesamememorylocation. Proof:FromProposition5.3,wealreadyknowthatandhaveanemptyintersection.Togetherwiththeinclusionofn((WW)n)into,thisproves semantics,andbeingtheonlyinformationaboutpermittingtwoinstancesto [. thecorrectnessoffexp Toprovetheoptimalityresult,onerstobservethatdenesanequivalencerelation ofwriteinstances,andsecondthatisthelargestequivalencerelationincludedin e=(fe;).theconstraintisalsoenforcedbyfexp esince. aparallelexecutionorderandapredenedexpansionconstraint.figure5.34givesan Theorem5.4givesusanautomaticmethodtominimizememoryusage,accordingto 20SeeSection2.4.4forageneralremarkaboutoptimality.
intuitivepresentationofthiscomplexresult:startingfromthe\maximalconstrained 214 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION correctexpansion",beforecombiningtheresultwiththeconstrainttogeta\minimal expansion",wecomputeaparallelexecutionorder,fromwhichwecomputea\minimal correctconstrainedexpansion".... Single-assignmentform Constrainedexpansion Correctoptimizedexpansion (scheduling,tiling,etc.) Data-owexecutionorder (storagemappingoptimization) Correctparallelexecutionorder Expansion Parallelism <seq <par Originalstoragemapping Sequentialprogram...Figure5.34.Howweachieveconstrainedstoragemappingoptimization... insection5.4.3intothesystem: 5.4.4 Asasummaryoftheoptimizationproblem,onemaygrouptheformalconstraintsexposed Algorithm 8><>: 8v;w2W:vw^vw=)(v)=(w) Constraintsonfexp vw^vw=)(v)6=(w) e =(fe;): Figure5.35showstheacyclicgraphallowingcomputationofrelationsandmappings 8({1;r1);({2;r2)2A:({1;r1)exp({2;r2)=){1<par{2 Constraintson<par: involvedinthissystem. rewrittentohandleconstrainedexpansion.beforeapplyingconstrained-storage- Mapping-Optimization,wesupposethatparallelexecutionorder<parhasbeencom- withanextensionofthepartialexpansionalgorithmpresentedinsection5.3.4, ThealgorithmtosolvethissystemisbasedonTheorem5.4.Itcomputesrelation Then,thisparallelexecutionorderisusedtocomputetheexpansioncorrectnesscriterion putedfrom<seq,,,and,byrstcomputingdependencerelationexpthenap- plyingsomeappropriateparallelordercomputationalgorithm(scheduling,tiling,etc.)..algorithmconstrained-storage-mapping-optimizationreusescompute- RepresentativesandEnumerate-RepresentativesfromSection5.2.5. intorenameddatastructurestoimproveperformanceandreducememoryusage. hasbeenproducedbyatilingtechnique,wehavealreadypointedinsection5.3.6that AsinthelastparagraphofSection5.2.4,onemayconsidersplittingexpandedarrays Eventually,whenthecompilerortheuserknowsthattheparallelexecutionorder<par
5.4.CONSTRAINEDSTORAGEMAPPINGOPTIMIZATION... 215 Programanalysis Program(<seq;fe) Expansionscheme <seq Programanalysis exp Section5.4.5 <parscheduling,etc. Coloration Enumerationofequivalenceclasses...Figure5.35.Solvingtheconstrainedstoragemappingoptimizationproblem... f0e=(fe;)andcodegenerationfor(<par;f0e) onemaybuildavectorofeachdimensionsize,anduseitasa\suggestion"forablockcyclicstoragemapping.thisvectorofblocksizesisusedwhenreplacingthecallto thecyclicgraphcoloringalgorithmisnotecientenough.ifthetileshapeisknown, Storage-Mapping-Optimization. Cyclic-ColoringwithacalltoNear-Block-Cyclic-ColoringinConstrainedbutthisdoesnotmeanleavingtheusercomputerelation! Ourgoalhereisnottochoosetherightconstraintsuitabletoexpandagivenprogram, 5.4.5 BuildingExpansionConstraints Section5.2). =R.Theconstraintisthusbuiltfrominstancewisereachingdenitionresults(see Anotherexampleisprivatization,seenasexpansionalongsomesurroundingloops, AsshowninSection5.4.2,enforcingtheexpansiontobestaticcorrespondstosetting thecomponentsassociatedwithprivatizedloops: withoutrenaming.considertwoaccessesuandvwritingintothesamememorylocation. Afterprivatization,uandvassignthesamelocationiftheiriterationvectorscoincideon whereiter(u)[privatizedloops]holdscountersofprivatizedloopsforinstanceu. uv()iter(u)[privatizedloops]=iter(v)[privatizedloops];
Constrained-Storage-Mapping-Optimization(program;;;;<par) 216program:anintermediaterepresentationoftheprogram CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION :theconictrelation :thereachingdenitionrelation,seenasafunction 1 returnsanintermediaterepresentationoftheexpandedprogram :theexpansionconstraint 2 <par:theparallelexecutionorder 3 Cyclic-Coloring(\) [ ((R)W)\par\<seq[ par\((par\<seq)) ((R)W)\par\>seq[ par\((par\<seq)) 6 4 5 7foreacharrayA2program Compute-Representatives(\) Enumerate-Representatives(;) [(n((ww)n)) 10 11 98doA declarationa[shape] foreachstatementsassigningainprogram doleft-handsidea[subscript]ofs component-wisemaximumof(u)forallwriteaccessesutoa Aexp[shape,A] 14 13 12 do=ref foreachreferencereftoainprogram quast \(Iref) Make-Quast(=ref) Aexp[subscript,(CurIns)] 15 16 17returnprogram map ref map(curins) CSMO-Convert-Quast(quast;ref) CSMO-Convert-Quast(quast;ref) quast:thequastrepresentationofthereachingdenitionfunction ref:theoriginalreference 31switch 2returnstheimplementationofquastasavalueretrievalcodeforreferenceref casequast=f?g: 654 casequast=f{g: A returnref 78 S x Iter({) Stmt({) Array({) 10 11 9 casequast=f{1;{2;:::g: returnaexp[subscript,x] return(f{1;{2;:::g) originalarraysubscriptinref 13 12 casequast=ifpredicatethenquast1elsequast2: returnifpredicatecsmo-convert-quast(quast1;ref) BuildingtheconstraintforarraySSAisevensimpler.Instancesofthesamestatement elsecsmo-convert-quast(quast2;ref) assigningthesamememorylocationmuststilldosointheexpandedprogram(only variablerenamingisperformed): uv()stmt(u)=stmt(v)
5.4.CONSTRAINEDSTORAGEMAPPINGOPTIMIZATION Now,rememberwehavedenedanextensionofreachingdenitions,calledreaching 217 denitionsofmemorylocations.thisdenitioncanbeusedtoweakenthestaticexpansionconstraint:iftheaimofconstrainedexpansionistoreducerun-timeoverheaddue functionsgeneratedbytheclassicalalgorithmhavedisappeared,seethesecondmethod ifloop-nests-ml-saisusedtoconvertaprogramtosaform,wehaveseenthat insection5.1.4.itwouldthusbeinterestingtoreplace tofunctions,thenmlseemsmoreappropriatethantodenetheconstraint.indeed, inline14ofconstrained-storage-mapping-optimizationby Make-Quast(ml Make-Quast(=ref) andtoconsidertheconstraintdenedbythetransitiveclosureofrelationw 8v;w2W: vww()9c2f(u):v;w2ml(u;c); =ref(u;fe(u))) weakenedstaticexpansionwithstoragemappingoptimization. constraintwiscalledweakenedstaticexpansion.eventually,setting=wcombines wherefissomeconservativeapproximationoffe.maximalexpansionaccordingto peciallyarchitecturedependent(numberofprocessors,memoryhierarchy,communication expressedasconstraints statement-by-statement,user-dened,knowledge-based,andes- ofanexpansionstrategyisnotdicult.newexpansionstrategiesshouldbedesignedand Thesepracticalexamplesgivetheinsightthatbuildingfromtheformaldenition model)constraints. Lefebvrein[LF98],andthecoreoftheirsolutionhasbeenrecalledinSection5.3.5. OurgraphcoloringproblemisalmostthesameastheonestudiedbyFeautrierand 5.4.6 Graph-ColoringAlgorithm generation.aneasywork-aroundwouldbetoredesigntheoutputofalgorithmstorage- Mapping-Optimization,asproposedin[Coh99b]:letStmt(u)(resp.Iter(u))bethe statement(resp.iterationvector)associatedwithaccessu,andletnewarray(s)be However,theformulationisslightlydierentnow:itisnolongermixed-upwithcode thenameofthenewarrayassignedbys(afterpartialexpansion), 8v;w2W:vwdef ()NewArray(Stmt(v))=NewArray(Stmt(w)) forgraphdenedbyanerelations:cyclic-coloringisusedonstatementinstances Thissolutionissimplebutnotpractical.Wethuspresentafullalgorithmsuitable ^ Iter(v)modEStmt(v)=Iter(w)modEStmt(w): forourstoragemappingoptimizationpurposes.sincethealgorithmisgeneralpurpose, algorithmforstatementinstancesrequiresapreliminaryencodingofstatementname insidetheiterationvector,andapaddingofshortvectorswithzeroes.wealreadyuse thistechniquewhenformattinginstancestotheomegasyntax:seesection5.2.7fora weconsideraninterferencerelationbetweenvectors(ofthesamedimension).usingthis practicalexample. techniques:buildingofanexpansionvectorandpartialrenaming.thisdecompositioncamefromtheboundedstatementnumberwhichallowedecientgreedycoloring RememberthatStorage-Mapping-Optimizationwasbasedontwoindependent
techniques,andtheinnityofiterationvectorswhichrequiredaspeciccycliccoloring. 218 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION sidered:ifthevectorsrelatedwithaninterferencerelationhavesomedimensionswhose Cyclic-Coloringproceedsinaverysimilarway,andthereasoningofSection5.3.5and componentsmayonlytakeanitenumberofvalues,itisinterestingtoapplyaclassical twocoloringstagesisextendedhereinconsideringallnitedimensionsofthevectorscon- [LF98,Lef98]isstillapplicabletoproveitscorrectness.However,thedecompositioninto statementinstances,itisclearthatthelastdimensionisnite,butsomeexamplesmay coloringalgorithmtothesenitedimensions.wethenbuildanequivalencerelationof vectorsthatsharethesamenitedimensions:itiscalledfiniteinthecyclic-coloring compiletime.thisextensionmaythusbringmoreecientstoragemappingsthatthe presentmorenitedimensions,forexamplewithsmallloopswhoseboundsareknownat algorithm(thenumberofequivalenceclassesisobviouslynite).whenvectorsencode Storage-Mapping-OptimizationalgorithminSection5.3.4. Cyclic-Coloring() 2finite 1N returnsavalidandeconomicalcycliccoloration :theaneinterferencegraph dimensionofvectorsrelatedwithinterfere 53foreachclasssetinfinite 4doforp=1toN doworking equivalencerelationofvectorssharingthesamenitecomponents 678 f(v;w):v2set^w2set 9 maxv vector[p+1] f(v;max<lexfw:(v;w)2workingg)g ^v[1::p]=w[1::p]^v[1::p+1]<w[1::p+1] ^hs;vihs;wig 12foreachset;set0infinite 11interfere 10 cyclicset? vmodvector max<lexfw v[p+1]+1:(v;w)2maxvg 15coloring 16col 14 13doif(9v2set;v02set0:vv0) 17foreachsetinfinite theninterfere? Greedy-Coloring(interfere) interfere[f(set;set0)g 18docol 19returncol TheNear-Block-Cyclic-ColoringalgorithmisanoptimizationofCyclic- col[(cyclicset;coloring(set)) bolisusedforsymbolicintegerdivision.theintuitiveideaisthatablock-cycliccoloring associatedwithtiledprograms,ashintedinsection5.3.6.inthisparticularcase,we consider asinmosttilingtechniques aperfectlynestedloopnest.noticethe\="sym- Coloring:itincludesanimprovementofthetechniquetoecientlyhandlegraphs tocomputeoptimizedstoragemappingsfortiledprograms.asshowninsection5.3.6, theblock-cycliccoloringproblemisstillopenforaneinterferencerelations. ispreferedtothecycliconeoftheclassicalalgorithm. TheNear-Block-Cyclic-Coloringalgorithmshouldbeseenasarstattempt
Near-Block-Cyclic-Coloring(;shape) 5.4.CONSTRAINEDSTORAGEMAPPINGOPTIMIZATION :asymbolicinterferencegraph 219 1N returnsavalidandeconomicalblock-cycliccoloration shape:avectorofblocksizessuggestedbyatilingalgorithm 4doquotient0 3forp=1toN 2quotient numberofnestedloops 5 f(x;x):x2zng 76 if(@z:zquotient0quotient0 1z) thenquotient f(x;y):y[1]=x[1];:::;y[p]=x[p]=shapep;:::;y[n]=x[n]g quotient 9returncolquotient 8col Cyclic-Coloring(quotientquotient 1) 5.4.7 AsinSection5.3.8,-arraysshouldbechoseninone-to-onemappingwiththeexpanded datastructures,andargumentsoffunctions i.e.setsofpossiblereachingdenitions DynamicRestorationoftheData-Flow usedtorecomputethesetsofpossiblereachingdenitions:21a(set)referenceshouldbe replacedby fv2set:@w2set:v<seqw^:(v6w)^(v)=(w)g: shouldbeupdatedaccordingtothenewstoragemapping.thetechniqueisessentially thesame:functionfexp e isusedtoaccess-arrays,thenrelation6andfunctionare subscript,andthebooleantypeisnowpreferredfor-arrayselements.thisverysimple memorylocationwrittenbyapossiblereachingdenitioncanbededucedfromthearray optimizationreducesbothmemoryusageandrun-timeoverhead.algorithmcsmo- Anotheroptimizationisbasedontheshapeof-arrays:sincefexp e =(fe;),the Implement-Phisummarizestheseoptimizations.22 onlinecomputationoffunctionsisratherdierent. functionsinthessaframework[cfr+91,ks98].however,codegenerationforthe tionofthedataow.ourtechniqueextendsideasfromthealgorithmstoecientlyplace AshintedinSection5.1.4,thegoalisnowtoavoidredundancyintherun-timerestora- insection2.3.1,andaprogrampointisaninter-statementlocationintheprogramtext mergetogether.rememberthecontrol-owgraphisnotthecontrolautomatondened graph[cfr+91]:thereisajoinatsomeprogrampointwhenseveralcontrol-owpaths AsintheSSAframework,functionsshouldbeplacedatthejoinsofthecontrol-ow toausewhosesetofpossiblereachingdenitionsisnonemptyandholdsw.ifpoints [ASU86].Ofcourse,textualorder<txtisextendedtoprogrampoints. details.indeed,theonly\interesting"joinsarethoselocatedonapathfromawritew isthesetofprogrampoints,thesetof\interesting"joinsforanarray(orscalar)ais Joinsareecientlycomputedwiththedominancefrontiertechnique,see[CFR+91]for Section5.4.4.Tocorrectlyhandlethispartitioning,somesimple butrathertechnical modications location. 21Weuse:(v6w)toapproximatetherelationbetweenwritesthatmustassignthesamememory shouldbemadeonthealgorithm. 22Foreciencyreasons,anexpandedarrayAexpispartitionedintoseveralsub-arrays,asproposedin
CSMO-Implement-Phi(expanded) 220expanded:anintermediaterepresentationoftheexpandedprogram CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION 2doiftherearefunctionsaccessingAexp 31foreacharrayAexp[shape]inexpanded returnsanintermediaterepresentationwithrun-timerestorationcode 456 thendeclareanarrayaexp[shape]initializedtofalse foreachreadreferencereftoaexpwhoseexpandedformis(set) 87 dosub short dorefs foreachstatementsinvolvedinset arraysubscriptinref fv2set:@w2set:v<seqw^:(v6w)^(v)=(w)g 109 subs writereferenceins 12 11 ifnotalreadydonefors thenfollowingsinsert arraysubscriptinrefs 13 14returnexpanded (set) Aexp[max<seqf{2short:Aexp[sub,({;ref)]=trueg] Aexp[subs,(CurIns;refs)]=true denotedbyjoinsa,andisformallydenedby 8p2Points:p2JoinsA()9v;u2I: jinjoinsaapseudo-assignmentstatement Foreacharray(orscalar)Aintheoriginalprogram,theideaistoinsertateachjoin vu^stmt(v)<txtp<txtstmt(u)^array(stmt(u))=a:(5.37) extendedtothesepseudo-assignmentstatementsandtheconstraintstorage-mappingoptimizationprocessisperformedonthemodiedprograminsteadoftheoriginalone.23 PjA[]=A[]; whichcopiestheentirestructureintoitself.then,thereachingdenitionrelationis ApplicationofConstrained-Storage-Mapping-OptimizationandthenCSMO- Implement-Phi(oranoptimizedversion,seeSection5.1.4)generatesanexpandedprogramwhoseinterestingpropertyistheabsenceofanyredundancyinfunctions.Indeed, thelexicographicmaximumoftwoinstancesisnevercomputedtwice,sinceitisdoneas OptimizationandCSMO-Implement-Phi.KnobeandSarkarencounterasimilar whichwasnotthecaseforadirectapplicationofconstrained-storage-mapping- earlyaspossibleinthefunctionofsomepseudo-assignmentstatement. problemwithssaforarrays[ks98]andproposeseveraloptimizations(mostlybased However,theexpandedprogramsuersfromtheoverheadinducedbyarraycopying, Nevertheless,thereissuchageneralmethod,basedontheobservationthateachpseudoassignmentstatementintheexpandedprogramisfollowedbyan-arrayassignation,by toremovearraycopies{itistheverynatureofssatogeneratetemporaryvariables. oncopypropagationandinvariantcodemotion),buttheyprovidenogeneralmethod codegenerationforapseudo-assignmentstatementp: constructionofpseudo-assignmentstatementsandthesetjoinsa.considerthefollowing reachingdenitionsforpseudo-assignmentaccessescanbededucedfromtheoriginalreachingdenition relation. 23Extendingthereachingdenitionrelationdoesnotrequireanyotheranalysis:thesetsofpossible for(){//iteratethroughthewholearray
P5.4.CONSTRAINEDSTORAGEMAPPINGOPTIMIZATION Aexp[subscript]=Aexp[max(set)]; 221 StatementPdoesnotcomputeanything,itonlygatherspossiblevaluescomingfrom } Aexp[subscript]=true; dierentcontrolpaths.theideaisthustostoreinstancesinsteadofbooleansandtouse thearraycopyisbypassedinupdating@aexp[subscript]withthemaximuminright-hand @-arrays(seesection5.1.4)insteadof-arrays.anarray@aexpisinitializedto?,and sideofp.thepreviouscodefragmentcanthussafelybereplacedby: for(){//iteratethroughthewholearray ThistechniquetoremovespuriousarraycopiesisimplementedinCSMO-Efficiently- }@Aexp[subscript]=max(set); Implement-Phi:theoptimizedgenerationcodealgorithmforfunctions.Remember CSMO-Efficiently-Implement-Phi(expanded) shouldbeappliedontheoriginalprogramextendedwithpseudo-assignmentstatements.24 thatbeforecallingthisalgorithm,constrained-storage-mapping-optimization 2doiftherearefunctionsaccessingAexp 1foreacharrayAexp[shape]inexpanded expanded:anintermediaterepresentationoftheexpandedprogram 3returnsanintermediaterepresentationwithrun-timerestorationcode 456 thendeclareanarray@aexp[shape]initializedto? foreachreadreferencereftoaexpwhoseexpandedformis(set) 87 dosub short dorefs foreachstatementsinvolvedinset arraysubscriptinref fv2set:@w2set:v<seqw^:(v6w)^(v)=(w)g 10 9 subs writereferenceins 12 11 ifnotalreadydonefors thenfollowingsinsert arraysubscriptinrefs 14 15 13 foreachpseudo-assignmentptoaexpwithreference(set) dogenmax Aexp[max<seqf{2short:@Aexp[sub,({;ref)]g] code-generationforthelexicographicgenmaxinset @Aexp[subs,(CurIns;refs)]=CurIns 18returnexpanded 17 16 removestatementp right-handsideof-arrayassignmentfollowingp genmax bynextjointhenextinstanceofthenearestpseudo-assignmentstatementfollowing arithmetics isawellknownproblemwithveryecientparallelimplementations[rf94]. butitiseasierandsometimesfastertoperformanonlinecomputation.letusdenote Eventually,computingthelexicographicmaximumofaset denedinpresburger inreplacingeachassignmentoftheform CurIns.Computationofthelexicographicmaximumin(set)canbeperformedonline 24SameremarkregardingpartitioningofexpandedarraysasforCSMO-Implement-Phi. @Aexp[subscript,(CurIns)]=CurIns;
by@aexp[subscript,(nextjoin)]=max(@aexp[subscript,(nextjoin)],curins); 222 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION (isdenedforinstancesofnextjoin:itisapseudo-assignmenttoa). tivatingexampleyieldsthesameresultasthesaforminfigure5.28. ApplyingCSMO-Efficiently-Implement-Phiandthistransformationtothemo- 5.4.8 Thissectionaimstocharacterizecorrectparallelexecutionordersforaprogramafter maximalconstrainedexpansion.thebenetmemoryexpansionistoremovespurious ParallelizationafterConstrainedExpansion dependencerelationoftheexpandedprogramwithsequentialexecutionorder(<seq;fexp dependencesduetomemoryreuse,butsomememory-baseddependencesmayremainafter constrainedexpansion.westilldenotebyexp AsannouncedinSection5.4.3,wenowgivethefullcomputationdetailsfor(5.29). Dependencesleftbyconstrainedexpansionare,asusual,ofthreekinds. e(resp.exp)theexact(resp.approximate) e). 1.Outputdependencesduetowritesconnectedtoeachotherbytheconstraint(e.g. 2.Truedependences,fromadenitiontoaread,wherethedenitioneithermayreach thereadorisrelated(by)toadenitionthatreachestheread. byrinthecaseofmse). Formally,wethusdeneexp 3.Antidependencesfromareadtoadenitionwherethedenition,evenifitexecutes 8e2E;8v;w2Ae:vexp aftertheread,isrelated(by)toadenitionthatreachestheread. eforanexecutione2easfollows: ew() _fe(v)=fe(e(w))^ve(w)^v<seqw _fe(v)=fe(w)^vw^v<seqw vw Then,thefollowingdenitionofexpisthebestpessimisticapproximationofexp posingrelationisthebestavailableapproximationoffunctionfeandisthebest _fe(w)=fe(e(v))^e(v)w^v<seqw availableapproximationoffunctione: e,sup- 8v;w2A:vexpwdef ()_ 9u2W:uw^vu^vu^v<seqw(5.40) _vw^vw^v<seqw vw (5.39) (5.38) Now,sinceandarereexiverelations,weobservethat(5.38)isalreadyincludedin (5.40).Wemaysimplifythedenitionofexp: _ 9u2W:uv^uw^uw^v<seqw(5.41) 8v2W;w2R:vexpw, 9u2W:uw^vu^vu^v<seqw 8v2R;w2W:vexpw, 9u2W:uv^uw^uw^v<seqw 8v;w2W:vexpw,vw^vw^v<seqw (5.42)
Eventually,wegetanalgebraicdenitionofthedependencerelationaftermaximalconstrainedexpansion: 5.4.CONSTRAINEDSTORAGEMAPPINGOPTIMIZATION 223 (includingreachingdenitions),andthethirdonedescribesanti-dependences. Thersttermdescribesoutputdependences,thesecondonedescribesowdependences exp =(\)[(\)[ 1(\): (5.43) <paraftermaximalconstrainedexpansion.practicalcomputationof<parisdonewith schedulingortilingtechniques,seesection2.5.2. Usingthisdenition,Theorem2.2page81describescorrectparallelexecutionorder constraintistheoneofthemaximalstaticexpansion.first,wedenethesequential executionorder<seqwithinomega(withconventionsdenedinsection5.2.7): Asanexample,weparallelizetheconvolutionprograminFigure5.6(page169).The Lex:={[i,w,2]->[i',w',2]:1<=i<=i'<=N&&1<=w,w'&&(i<i' w<w')} union{[i,0,1]->[i',w',2]:1<=i<=i'<=n&&1<=w'} union{[i,w,2]->[i',0,1]:1<=i,i'<=n&&1<=w&&i<i'} union{[i,0,3]->[i',0,3]:1<=i<i'<=n} union{[i,0,1]->[i',0,3]:1<=i<=i'<=n} union{[i,0,3]->[i',0,1]:1<=i<i'<=n} union{[i,0,1]->[i',0,1]:1<=i<i'<=n} structureisascalarvariable),andthatrelationrisdenedby(5.12).wecomputeexp Second,recallfromSection5.2.7thatallwritesareinrelationfor(sincethedata union{[i,w,2]->[i',0,3]:1<=i<=i'<=n&&1<=w} union{[i,0,3]->[i',w',2]:1<=i<i'<=n&&1<=w'}; D; from(5.43): {[i,w,2]->[i,w',2]:1<=i<=n&&1<=w<w'}union D:=(RunionR(S)unionS'(R))intersectionLex; {[i,0,1]->[i,w',2]:1<=i<=n&&1<=w'}union {[i,0,1]->[i,0,3]:1<=i<=n}union {[i,w,2]->[i,0,3]:1<=i<=n&&1<=w} i.itmakestheouterloopparallel(itwasnotthecasewithoutexpansionofscalarx). TheparallelprograminmaximalstaticexpansionisgiveninFigure5.14.b. AfterMSE,itonlyremainsdependencesbetweeninstancessharingthesamevalueof UsingtheOmegaCalculatortext-basedinterface,wedescribeastep-by-stepexecution 5.4.9 oftheexpansionalgorithm.wehavetocodeinstancesasinteger-valuedvectors.an BacktotheMotivatingExample instancehs;iiisdenotedbyvector[i,..,s],where[..]possiblypadsthevectorwith written[i,j,0,1],[i,j,k,2]and[i,0,0,3],respectively. zeroes.wenumbert,s,rwith1,2,3inthisorder,soht;i;ji,hs;i;j;kiandhr;iiare S:={[i,0,0,3]->[i,j,k,2]:1<=i,j<=M&&1<=k<=N} TheresultofinstancewisereachingdenitionanalysisiswritteninOmega'ssyntax: union{[i,j,k,2]->[i,j,k-1,2]:1<=i,j<=m&&2<=k<=n}; union{[i,j,1,2]->[i,j,0,1]:1<=i,j<=m}
224Theconictandno-conictrelationsaretrivialhere,sincetheonlydatastructureis CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION ascalarvariable:isthefullrelationand6istheemptyone. Con:={[i,j,k,s]->[i',j',k',s']:1<=i,i',j,j'<=M&&1<=k,k'<=N NCon:={[i,j,k,s]->[i',j',k',s']:1=2};#1=2meansFALSE! &&((s=1&&k=0) s=2 (s=3&&j=k=0)) AsinSection5.4.1,wechoosestaticexpansionasconstraint.Relationisthus &&((s'=1&&k'=0) s'=2 (s'=3&&j'=k'=0))}; R:=S(S'); S':=inverseS; denedasrinsection5.2.2: dependencesisdoneaccordingto(5.43)andrelationconisremovedsinceitalwaysholds: D:=RunionR(S)unionS'(R); NotransitiveclosurecomputationisnecessarysinceRisalreadytransitive.Computing closurecomputation: Par:=D+; Inthiscase,asimplesolutiontocomputingaparallelexecutionorderisthetransitive #The"full"relation callitint. Wecannowcomputerelation./inleft-handsideoftheexpansioncorrectnesscriterion, Full:={[i,j,k,s]->[i',j',k',s']:1<=i,i',j,j'<=M&&1<=k,k'<=N #Thesequentialexecutionorder &&((s=1&&k=0) s=2 (s=3&&j=k=0)) Lex:={[i,j,0,1]->[i',j',0,1]:1<=i<i'<=M&&1<=j,j'<=M} &&((s'=1&&k'=0) s'=2 (s'=3&&j'=k'=0))}; union{[i,j,0,1]->[i',j',k',2]:1<=i<=i'<=m&&1<=j,j'<=m union{[i,j,k,2]->[i',j',k',2]:1<=i<=i'<=m&&1<=j,j'<=m union{[i,j,k,2]->[i',j',0,1]:1<=i<i'<=m&&1<=j,j'<=m &&1<=k<=N} &&1<=k'<=N} union{[i,j,k,2]->[i',0,0,3]:1<=i<=i'<=m&&1<=j<=m union{[i,0,0,3]->[i',j',0,1]:1<=i<i'<=m} union{[i,j,0,1]->[i',0,0,3]:1<=i<=i'<=m} &&1<=k,k'<=N&&(i<i' (j<=j'&&(j<j' k<k')))} union{[i,0,0,3]->[i',j',k',2]:1<=i<i'<=m&&1<=j'<=m &&1<=k<=N} ILex:=inverseLex; union{[i,0,0,3]->[i',0,0,3]:1<=i<i'<=m}; &&1<=k'<=N} INPar:=inverseNPar; NPar:=Full-Par;
Int:=(INParintersectionILex) 5.4.CONSTRAINEDSTORAGEMAPPINGOPTIMIZATION 225 Int:=Intunion(inverseInt); Theresultis: union(inparintersections(nparintersectionlex)); Int; {[i,j,k,2]->[i',j',k',2]:1<=j<=j'<=m {[i,j,k,2]->[i',j',k',2]:1<=j<j'<=m {[i,j,k,2]->[i',j,k',2]:1<=k'<k<=n &&1<=k<=k'<=N&&1<=i'<i<=M}union {[i,j,1,2]->[i',j',1,2]:n=1 &&1<=k'<k<=N&&1<=i'<i<=M}union {[i,j,k,2]->[i',j',k',2]:1<=k<=k'<=n &&1<=i'<i<=M&&1<=j<=M}union {[i,j,k,2]->[i',j',k',2]:1<=k'<k<=n &&1<=i'<i<=M&&1<=j'<j<=M}union {[i,j,k,2]->[i',j,k',2]:k'-1,1<=k<=k' &&1<=i'<i<=M&&1<=j'<j<=M&&2<=N}union {[i,j,k,2]->[i',j',k',2]:1,k'-1<=k<=k' &&1<=i'<i<=M&&1<=j'<j<=M}union {[i,j,k,2]->[i',j',k',2]:1<=i<i'<=m &&1<=i<i'<=M&&1<=j<=M&&k<N}union {[i,j,k,2]->[i',j',k',2]:k'-1,1<=k<=k' &&1<=i<i'<=M&&1<=j<j'<=M&&k<N}union {[i,j,k,2]->[i',j',k',2]:k-1,1<=k'<=k &&1<=j<j'<=M&&1<=k'<k<N}union {[i,j,k,2]->[i',j',k',2]:1<=k<k'<n &&1<=i<i'<=M&&1<=j'<j<=M&&k<N}union {[i,j,k,2]->[i',j',k',2]:1,k-1<=k'<=k &&1<=j<j'<=M&&1<=i'<i<=M&&k'<N}union {[i,j,k,2]->[i',j,k',2]:k-1,1<=k'<=k &&1<=i'<i<=M&&1<=j'<j<=M}union &&1<=i'<i<=M&&1<=j<=M&&k'<N}union &&1<=i'<i<=M&&1<=j'<j<=M&&k'<N}union {[i,j,k,2]->[i',j',k',2]:1<=i<i'<=m {[i,j,1,2]->[i',j',1,2]:n=1&&1<=i<i'<=m &&1<=j<j'<=M&&1<=k<k'<=N}union {[i,j,k,2]->[i',j,k',2]:1<=i<i'<=m &&1<=j<j'<=M&&1<=k'<=k<=N&&2<=N}union &&1<=j<j'<=M}union {[i,j,k,2]->[i',j',k',2]:1<=i<i'<=m &&1<=k<k'<=N&&1<=j<=M}union &&1<=k<k'<=N&&1<=j'<j<=M}union &&1<=k'<=k<=N&&1<=j'<=j<=M}
226Aquickvericationshowsthat CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION and Intintersection{[i,j,0,1]->[i,j,k',2]:k'!=0}; Intintersection{[i,j,k,2]->[i,j,k',2]}; theiloop),wt1(v),ws1(v)(forthejloop)holdallaccesseswexecutingafterv.then, all1kn(rdoesnotperformanywrite).however,thesetswt0(v),ws0(v)(for dierentiorjenforcesdierentcolorforht;i;jiandhs;i;j;ki.applicationofthegraph arebothempty.itmeansthatht;i;jiandhs;i;j;kishouldsharethesamecolorfor Col:={[i,j,0,1]->[i,j,k,2]:1<=i,j<=M&&1<=k<=N} coloringalgorithmthusyieldsthefollowingdenitionofthecoloringrelation: Eco:=Runion(Col-R(Full-Col(R))); Wenowcomputerelation,thanksto(5.35): union{[i,j,k,2]->[i,j,k',2]:1<=i,j<=m&&1<=k,k'<=n}; Rho:=Eco-Lex(Eco); (relationalwaysholdsandhasbeenremoved): Wechoosetherepresentativeofeachequivalenceclassasthelexicographicminimum Rho; Theresultis: {[i,j,k,2]->[i,j,0,1]:1<=i<=m&&1<=j<=m&&1<=k<=n} {[i,j,0,1]->[i,j,0,1]:1<=i<=m&&1<=j<=m}union Theresultingfunctionisthus Thelabelingschemeisobvious:thelasttwodimensionsarestrippedofromRho. computedthesamestoragemappingasinfigure5.31. FollowingthelinesofConstrained-Storage-Mapping-Optimization,wehave (ht;i;ji)=(i;j) and (hs;i;j;ki)=(i;j): Thistopichasreceivedlittleinterestfromthecompilationcommunity,butthesituation 5.5 Thelastcontributionofthisworkisaboutautomaticparallelizationofrecursiveprograms. ParallelizationofRecursivePrograms isevolvingthankstonewpowerfulmulti-threadedenvironmentsforecientexecution ofprogramswithcontrolparallelism.whendealingwithshared-memoryarchitectures andsoftware-emulatedsharedmemorymachines,toolslikecilk[mf98]provideavery suitableprogrammingmodelforautomaticorsemi-automaticcodegeneration[rr99]. ititstillanopenproblemtocomputeaschedulefromadependencerelationdescribed byatransducer.thisisofcourseastrongargumentagainstdataparallelismasamodel ofchoiceforparallelizationofrecursiveprograms.moreover,wehaveseeninsection1.2 Now,whatprogrammingmodelshouldweconsiderforparallelcodegeneration?First,
5.5.PARALLELIZATIONOFRECURSIVEPROGRAMS thatthecontrolparallelparadigmwaswellsuitedtoexpressparallelexecutioninrecursiveprograms.infact,thisassertionistruewhenmostiterativecomputationsare implementedwithrecursivecalls,butnotwhenparallelismislocatedwithiniterationsof 227 parallelisminthefollowing. aloop.sinceloopscanberewrittenasrecursiveprocedurecalls,wewillsticktocontrol algorithmforrecursivestructureshasbeenproposedyet.wethusstartwithaninvestigationofspecicaspectsofexpandingrecursiveprogramsandrecursivedatastructuresin Section5.5.1.ThenwepresentinSection5.5.2asimplealgorithmforsingle-assignment Noticewehavestudiedpowerfulexpansiontechniquesforloopnests,butnopractical formconversionofanycodethattintoourprogrammodel:thealgorithmcanbeseenas apracticalrealizationofabstract-sa,theabstractalgorithmforsa-formconversion tion5.5.4;andsomepracticalexamplesarestudiedinsection5.5.5.wealsogivesome (page157).then,aprivatizationtechniqueforrecursiveprogramsisproposedinsec- perspectivesaboutextendingmaximalstaticexpansionorstoragemappingoptimization tion5.5.6startswithashortstateoftheartonparallelizationtechniquesforrecursive programs,thenmotivatesthedesignofanewalgorithmbasedoninstancewisedata- tothislargerclassofprograms. owinformation.insection5.5.7,wepresentanimprovementofthestatementwise Therestofthissectionaddressesgenerationofparallelrecursiveprograms.Sec- statements butitisstilldecidedatcompile-time.thistechniqueisalsocompletelynovel inparallelizationofrecursiveprograms. statementsexecuteinparallelorinsequencecanbedependentontheinstanceofthese algorithmwhichallowsinstancewiseparallelizationofrecursiveprograms:whethersome 5.5.1 BeforeproposingageneralsolutionforSA-formconversionofrecursiveprograms,we investigateseveralissueswhichmaketheproblemmoredicultforrecursivecontroland ProblemsSpecictoRecursiveStructures datastructures.recallthatelementsindatastructuresinsingle-assignmentformare inone-to-onemappingwithcontrolwords.thus,thepreferredlayoutofanexpanded Butautomaticrecognitionofsuchprogramsandeectivedesignofaspecicexpansion whenloopsandrecursivecallsarenot\interleaved" programqueensissuchanexample. arrays:itisthecasewhenonlyloopsandsimplerecursiveproceduresareinvolved,and datastructureisatree.expandeddatastructurescansometimesbeimplementedwith structuresaretreeswhoseedgesarelabeledbystatementnames. ManagementofRecursiveData-Structures techniqueareleftforfuturework.wewillthusalwaysconsiderthatexpandeddata refertheaccessofanelementindexbyword{inadatastructuredexp.butwhendexpis Abstract-SA(page157)forSA-formconversionusesthenotationDexp[CurIns]to areindeednotrandomaccessdatastructures.forexample,theabstractalgorithm Comparedtoarrays,listsandtreesseemsmuchlesseasytoaccessandtraverse:they atree,whatdoesitmean?howisitimplemented?isitecient? controlword.its\evolution"duringprogramexecutionisfullypredictable:itcanbeseen time.amorecleveranalysisshowsthatcurinsisnotarandomword:itisthecurrent pointerdereferencesalonglettersincurins,theresultisofcourseverycostlyatrun- Thereisaquickanswertoallthesequestions:thetreeistraversedfromitsrootusing
asadierentlocalvariableineachprogramstatement,anewletterbeingaddedateach 228 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION techniqueshouldbeusedtoreducetherun-timeoverhead.wethussupposethatan cannotbeallocatedatcompile-timeingeneral,averyecientmemorymanagement blockentry. automaticschemeforgroupingmallocsornewsisimplemented,possiblyatthec-compiler Theotherproblemwithrecursivedatastructuresismemoryallocation.Becausethey oroperatingsystemlevel. CurInsmustbegeneratedbyalgorithmAbstract-SA;eachtimeablockisentered, algorithm.theideaisthefollowing:supposearecursivedatastructureindexedby anewelementofthedatastructureisallocatedandthepointertothelastelement Eventually,bothproblemscanbesolvedwithasimpleandecientcodegeneration storedinalocalvariable isdereferencedaccordingly.thistechniqueisimplementedin Recursive-Programs-SA. recursiveprograms,twokindofproblemsimmediatelyarise: AboutAccuracyandVersatility Whentryingtoextendmaximalstaticexpansionandstoragemappingoptimizationto transductionsarenotasversatileasanerelations,becausesomecriticalalgebraic theresultsofdependenceandreachingdenitionanalysesarenotalwaysasprecise operationsarenotdecidableandrequireconservativeapproximations; Thesetwopointsareofcourselimitingtheapplicabilityof\evolved"expansiontechniques asonewouldexpect,becauseofthelackofexpressivenessofrationalandone-counter whichintensivelyrelyonalgebraicoperationsonsetsandrelations. transductions. lacking,e.g.,theclassofleft-synchronousrelationsisnotclosedundertransitiveclosure. thelexicographicalselectionofaleft-synchronoustransductionisleft-synchronous,see Conversely,theproblemofenumeratingequivalenceclassesseemsrathereasybecause Inaddition,afewcriticaloperationsusefulto\evolved"expansiontechniquesare timalityshouldprobablynothopedfor,evenforrecognizablerelations.graph-coloring algorithmsforrationalrelationswouldofcoursebeusefulforstoragemappingoptimization;butrecallfromsection5.3.2thatmanyalgebraicoperationsareinvolvedinthe Wearenotawareofanyresultaboutcoloringgraphsofrationalrelations,butop- Section3.4.3;aremainingproblemwouldbetolabeltheclassrepresentatives... relations. expansioncorrectnesscriterion,andmostoftheseoperationsareundecidableforrational privatization.butthisproblemismorewiththeprogrammodelrestrictionsthanwith modelandrequireexpansiontechniquesmore\evolved"thansingle-assignmentformor theapplicabilityofstaticexpansionandstoragemappingoptimization. Thelastpointisthatwehavenotfoundenoughcodesthatbothtintoourprogram gorithmloop-nests-saforrecursiveprograms.itworkstogetherwithrecursive- AlgorithmRecursive-Programs-SAisarstattempttogiveacounterpartofal- 5.5.2 Programs-Implement-Phitogeneratethecodeforfunctions.Expandeddatastructuresallhavethesametype,ControlType,whichisbasicallyatreetypeassociatedwith
5.5.PARALLELIZATIONOFRECURSIVEPROGRAMS thelanguagelctrlofcontrolwords.itcanbeimplementedusingrecursivetypesand 229 sub-types,orsimplywithasmanypointereldsasstatementlabelsinctrl.anadditionaleldincontroltypestorestheelementvalue,ithasthesametypeasoriginaldata Recursive-Programs-SA(program;) structureelements,anditiscalledvalue. 1deneatreetypecalledControlTypewhoseelementsareindexedinLctrl returnsanintermediaterepresentationoftheexpandedprogram :areachingdenitionrelation,seenasafunction program:anintermediaterepresentationoftheprogram 2foreachdatastructureDinprogram 43dodeneadatastructureDexpoftypeControlType 765 foreachcalltoaprocedurepinprogram doinsertanewargumentdlocalintherstplace foreachprocedureinprogram deneaglobalpointervariabledlocal=&dexp 10 98 doinsertdlocal->p=newcontroltype()beforethecall 12 11 foreachnon-procedureblockbinprogram doinsertdlocal->b=newcontroltype()atthetopofb denealocalpointervariabledlocal=dlocal->b insertanewargumentdlocal->pintherstplace 16 15 14 13 foreachreferencereftodinprogram doleft-handsideofs foreachstatementsassigningdinprogram 17returnprogram doref ((CurIns;ref)) Dlocal->value dierentmemorylocations,i.e.whichcannotbereplacedbyanotherlabelandyieldanotherinstanceofanassignationstatementtotheconsidereddatastructure.appliedtments.byuseless,wemeanstatementlabelswhicharenotusefultodistinguishbetween ControlType,andeverypointerupdatecodeintheassociatedprogramblocksandstate- Asimpleoptimizationtosparememoryconsistsinremovingall\useless"eldsfrom totakebenetofthelocalityofdatastructureusageinprograms. Thisoptimizationshouldofcoursebeappliedonadatastructureperdatastructurebasis, Q,a,andb;allotherlabelsareunnecessarytoenforcethesingle-assignmentproperty. programqueens,onlythreelabelscanbeconsideredtodenetheeldsofcontroltype: denitionsatrun-timeisnotaseasyasinthecaseofloopnests.infact,apartofthe algorithmiseven\abstract":wehavenotdiscussedyethowtheargumentofthecanbe problemforecientcodegeneration,butdetectingexactresultsandcomputingreaching Oneshouldnoticethateveryreadreferencerequiresafunction!Thisisclearlyabig storeaddressesofmemorylocations,computedfromtheoriginalwritereferencesinassignmentstatements.eachfunctionrequiresatraversalof-structurestocomputethe Ofcourse,algorithmRecursive-Programs-Implement-Phigeneratesthecodefor computed.tosimplifytheexposition,alltheseissuesareaddressedinthenextsection. -structuresdexpusingthesametechniquesasthesa-formalgorithm.these-structures exactreachingdenitionatrun-time:themaximumiscomputedrecursivelyfromthe themaximumcanbedoneinparallel,asusualforreductionoperationsontrees. rootofdexp,andtheappropriateelementvalueindexpisreturned.thiscomputationof
Recursive-Programs-Implement-Phi(expanded) 230expanded:anintermediaterepresentationoftheexpandedprogram CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION 1foreachexpandeddatastructureDexpinexpanded 2doiftherearefunctionsaccessingDexp 3returnsanintermediaterepresentationwithrun-timerestorationcode 645 thendeneadatastructuredexpoftypecontroltype deneaglobalpointervariabledlocal=&dexp 87 doinsertanewargumentdlocalintherstplace doinsertdlocal->p=newcontroltype()beforethecall foreachcalltoaprocedurepinprogram foreachprocedureinprogram 10 11 12 9 foreachnon-procedureblockbinprogram doinsertdlocal->b=newcontroltype()atthetopofb insertanewargumentdlocal->pintherstplace 14 13 15 foreachreadreferencereftodexpwhoseexpandedformis(set) doforeachstatementsinvolvedinset insertdlocal->value=null denealocalpointervariabledlocal=dlocal->b 16 17 19 18 dorefs (set) ifnotalreadydonefors thenfollowingsinsertdlocal->value=&refs {traversedexpanddexpinlexicographicorder writereferenceins 20returnexpanded maxloc->value;} usingpointersdlocalanddlocalrespectively if(dlocal->value==&ref)maxloc=dlocal; Thetreetraversaldoesnotusethesetargumentoffunctionsatall!Indeed, Twoproblemsremainwithfunctionimplementation. testingformembershipinarationallanguageisnotaconstant-timeproblem,and itisevennotlinearingeneralforalgebraiclanguages.thispointisalsorelated Severalfunctionsmayinducemanyredundantcomputations,sincethemaximum nextsection. withrun-timecomputationofsetsofreachingdenitions:itwillbediscussedinthe previousresults.thisproblemwassolvedforloopnestsusingacomplextechnique musteverytimebecomputedonthewholestructure,nottakingbenetofthe integratedwithconstrainedstoragemappingoptimization(seesection5.4.7),but 5.5.3nosimilartechniqueforrecursiveprogramsisavailable. Inthelastsection,allreadaccesseswereimplementedwithfunctions.Thissolution ensurescorrectnessoftheexpandedprogram,butitisobviouslynotthemostecient. GeneratingCodeforReadReferences forloopnests(withthequastrepresentation).sadly,thisisnotaseasyingeneral:some rationalfunctionscannotbecomputedforagiveninputinlineartime,anditiseven exact),wecanhopeforanecientrun-timecomputationofitsvalue,asitisthecase Ifweknowthatthereachingdenitionrelationisapartialfunction(i.e.theresultis worseforalgebraicfunctions.
5.5.PARALLELIZATIONOFRECURSIVEPROGRAMS 231 Theclassofsequentialfunctionsisinterestingforthispurpose,sinceitisdecidable andallowsecientonlinecomputation,seesection3.3.3.becauseforeverystateand inputletter,theoutputletterandnextstateareknownunamiguously,wecancompute sequentialfunctionstogetherwithpointerupdatesforexpandeddatastructures.this techniquecanbeeasilyextendedtoasub-sequentialfunction(t;),inaddingthepointer updatesassociatedwithfunction(fromstatestowords,seedenition3.10page100). Theclassofsub-sequentialtransductionsisdecidableinpolynomialtimeamongrationaltransductionsandfunctions[BC99b].Thisonlinecomputationtechniqueisdetailed inalgorithmrecursive-programs-online-sa,forsub-sequentialreachingdention transductions.anextensiontoonlinerationaltransductionwouldalsobepossible,withoutsignicantlyincreasingtherun-timecomputationcost,butdecidabilityisnotknown forthisclass. Dealingwithalgebraicfunctionsislessenthusiastic,becausedecidingwhetheran algebraicrelationisafunctionisratherunlikely,anditisthesamefortheclassofonline algebraictransductions.butsupposingweareluckyenoughtoknowthatanalgebraic transductionisonline(henceapartialfunction),wecanimplementecientlytheruntimecomputation,withthesametechniqueasbefore:thenextstate,outputlabel,and stackoperationisneverambiguous. Asimilartechniquecanbeusedtooptimizethetreetraversalintheimplementationof(set)byalgorithmRecursive-Programs-Implement-Phi.Computinga left-synchronousapproximationofthereachingdenitiontransduction(eveninthecase ofanalgebraictransduction),onemayusetheclosureunderprex-selection(seesection3.4.3andespeciallyproposition3.11)toselectthetopmostnodeindexp[set]and Dexp[set].Thesetopmostnodescanbeusedinsteadoftherootofthetreestoinitiatethe traversal.tobecomputedatrun-time,however,therationalfunctionimplementingthe prex-selectionof(approximateingeneral)mustbesub-sequential.anotherapproach consistsincomputinganapproximationoftheunionofallpossiblesetsofreachingdenitionsinvolvedinagivenfunction.theresultisrational(resp.algebraic)ifthereaching denitiontransductionisrational(resp.algebraic),thankstonivat'stheorem3.6(resp. Evey'sTheorem3.24),anditcanbeusedtorestrictthetreetraversaltoasmallerdomain. Bothapproachescanbecombinedtooptimizethefunctionimplementation. Toconcludethisdiscussiononrun-timecomputationofreachingdenitions,only thecaseofsub-sequentialfunctionsisveryclear:itallowsecientonlinecomputation withalgorithmrecursive-programs-online-sa.inallothercases whichincludes allcasesofalgebraictransductions wethinkthatnorealalternativetofunctionsis available.inpractice,recursive-programs-online-sashouldbeappliedtothe largestsubsetofdatastructuresandreadreferencesonwhichissub-sequential,and Recursive-Programs-SAisusedfortherestoftheprogram.Itisperhapsoneof thegreatestfailuresofourframework,sincewecomputedaninterestinginformation reachingdenitions whichweareunabletouseinpractice.thisisalsoadiscouraging argumentforextendingstaticexpansiontorecursiveprograms:whatistheuseofremovingfunctionsifthereachingdenitioninformationfailstogivethevaluewearelooking foratalowercost?finally,functionsmaybesoexpensivetocomputethatconversion tosingle-assignmentformshouldbereconsidered,infavorofotherexpansionschemes.in thiscontext,averyinterestingalternativeisproposedinthenextsection. Eventually,lookingatourmotivatingexamplesinChapter4,orthinkingaboutmost practicalexamplesofrecursiveprogramsusingtreesandotherpointer-baseddatastructures,onecommonobservationcanbemade:thereis\notsomany"memoryreuse if
Recursive-Programs-Online-SA(program;) 232program:anintermediaterepresentationoftheprogram CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION 1deneatreetypecalledControlTypewhoseelementsareindexedinLctrl 2build(T;)fromwhereT=(Q;fq0g;F;E)issequentialand:Q!ctrl :asub-sequentialreachingdenitiontransduction 3builda\nextstate"function:Qctrl!QfromT returnsanintermediaterepresentationoftheexpandedprogram 6dodeclareadatastructureDexpoftypeControlType 74builda\nextoutput"function:Qctrl!ctrlfromT 5foreachdatastructureDinprogram 10 98 deneaglobal\state"variabledqlocal=q0 deneaglobalpointervariabledlocal=&dexp foreachprocedureinprogram deneaglobalpointervariabledlocal=&dexp 11 12 13 doinsertanewargumentdlocalintherstplace 15 14 foreachcalltoaprocedurepinprogram insertanewargumentdlocalinthesecondplace 16 doinsertdlocal->p=newcontroltype()beforethecall insertanewargumentdqlocalinthethirdplace 17 18 19 foreachnon-procedureblockbinprogram insertanewargumentdlocal->(dqlocal;p)inthesecondplace insertanewargument(dqlocal;p)inthethirdplace insertanewargumentdlocal->pintherstplace 20 21 doinsertdlocal->b=newcontroltype()atthetopofb 24 23 22 foreachstatementsassigningdinprogram denealocalpointervariabledqlocal=(dqlocal;b) denealocalpointervariabledlocal=dlocal->(dqlocal;b) denealocalpointervariabledlocal=dlocal->b 28returnprogram 27 26 25 doref foreachreferencereftodinprogram doleft-handsideofs Dlocal->(DQlocal)->value Dlocal->value notzeromemoryreuse intheseprograms!thislatebutsimplediscoveryisastrong simplybeuseless.infact,manytreeprogramsalreadyhaveahighlevelofparallelism argumentagainstmemoryexpansiontechniquesforrecursivetreeprograms:theymay QueensprograminChapter4. single-assignmenttechniquearelikelytobeveryrarelyusefulinpractice.inthecaseof anddonotneedtobeexpanded.thisisverydisappointingthatthebestresultsofour recursivearrayprograms,expansionisstillacriticalissueforparallelisation,likeforthe 5.5.4 WehaveseenthatSA-formconversionisnotpracticalforallrecursiveprograms.It wasalreadythecaseforloopnests,buttheproblemismoreobvioushere.however, PrivatizationofRecursivePrograms programs.becauseoftheheavyuseofproceduresandfunctions,lookingatexpansionasa transformationofglobaldatastructuresintolocalonesismuchmoreprotable.thisidea SA-formisprobablynotthemostsuitablemethodtoextractparallelismfromrecursive
5.5.PARALLELIZATIONOFRECURSIVEPROGRAMS happenstobeverysimilartotheprinciplesofarrayprivatizationforloopnests,andwe 233 callingprocedure.inaparallelexecution,thisoftenrequiresadditionalsynchronizations, usethesamewordhere.ageneralprivatizationtechniquecanbedenedforunrestricted andtheoverheadofsuchanexpansionislikelytobeveryhigh.furtherstudyisleftfor recursiveprograms,butcopy-outcodeisnecessarytoupdatethedatastructuresofthe futurework. vpapropertydenedinsection4.3.4(forreachingdenitionanalysispurposes):forall u;v2lctrl,ifvuthenvisanancestorofu,i.e.9w1;w22lctrl;s2ctrl:v=w1s^ u=w1w2(andv<lexu,whichistrivialsincevu).thispropertyisenforcedinmany Wewillrestrictourselvestothecaseofreachingdenitionrelationswhichsatisfythe structure(probablyanarray)tobeexpandedismadelocaltoeachprocedureinthe importantclassesofrecursiveprograms:alldivide-and-conquerexecutionschemes,most program,andtheappropriatecopy-incodeofthewholestructureisinsertedatthe dynamic-programmingimplementations,manysortingalgorithms... beginningpointofeachprocedure.noticenocopy-outisneededsinceitwouldinvolve Now,theprivatizationtechniqueforvpaprogramsisverysimple:everyglobaldata runeachfunctioncallinparallelandinsertsynchronizationsonlywhentheresultofa reachingdenitionsfromnon-ancestorinstances.aprogramprivatizedinthatsenseis functionisneeded. canbefoundatfunctioncallsonly:insteadofwaitingforthefunction'sreturn,onemay generallylessexpandedthansa-form25,andtheparallelismextractedbyprivatization copying,butthesameoptimizationthatworkedforloopnestscanbeappliedhere [TP93,MAL93,Li92]:privatizationcanbedoneonaprocessorbasisinstead,andcopyinisonlyperformedwhenaprocedurecallismadeaccrossprocessors.Weimplemented Thistechniquemayappearsomewhatexpensivebecauseofthedatastructure work[mf98].furtherdiscussionaboutparallelizationofexpandedprogramsisdelayed tosection5.5.6. ofparallelprocedures,the\slow"onebeingcalledonlywhenaprocessor\catches"new thisoptimizationforprogramqueens,usingcilk's\fast"and\slow"implementations TheresultisshowninFigure5.36.TheControlTypestructurehasbeenoptimizedin 5.5.5 Weappliedsingle-assignmentalgorithmRecursive-Programs-SAtoprogramQueens. ExpansionofRecursivePrograms:PracticalExamples keepingonlyeldswhichenforcethesingle-assignmentformproperty.itisimplemented withac++template-likesyntaxtohandlebothdexpand-structuredexp: structcontroltype<t>{ };ControlType<T>*b; Tvalue; ControlType<T>*Q; ControlType<T>*a; Queensisnotdeterministic.Thisruinsanyhopetoecientlycomputereachingdenitionsatrun-timeandtoremovethefunction,despitethefactouranalysistechnique 25Asatechnicalremark,thisisnotalwaystruebecausewecopythewholedatastructuresandnot Noticethattheinputautomatonforthereachingdenitiontransducerofprocedure eachelement.insometrickycases,privatizationcanrequiremorememorythansa-form!
234... CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION inta[n]; ControlType<int*>*Alocal=&Aexp; ControlType<int*>Aexp=newControlType<int*>(); ControlType<int>Aexp=newControlType<int>(); ControlType<int>*Alocal=&Aexp; A=a IPvoidQueens(ControlType<int>*Alocal,ControlType<int*>*Alocal, if(k<n){ for(inti=0;i<n;i++){ Alocal->b=newControlType<int>(); intn,intk){ B=b ControlType<int>*Alocal=Alocal->a; for(intj=0;j<k;j++){ Alocal->b=newControlType<int*>(); ControlType<int*>*Alocal=Alocal->a; ControlType<int*>*Alocal=Alocal->b; ControlType<int>*Alocal=Alocal->b; Alocal->b=newControlType<int*>(); Alocal->b=newControlType<int>(); Jrs } if(){ =((CurIns;A[j])); Alocal->Q=newControlType<int*>(); Alocal->value=&(A[k]); Alocal->Q=newControlType<int>(); Alocal->value=; Q }}} Queens(Alocal->Q,Alocal->Q,n,k+1); F}intmain(){ } Queens(Alocal,Alocal,n,0); computedanexactresult!thetreetraversalassociatedwiththefunctionhasnotbeen...figure5.36.single-assignmentformconversionofprogramqueens... implementedinfigure5.36,butitdoesnotrequireafulltranversalofdexp:because ControlType,stoppingattherstancestorindependence(i.e.thedeepestancestorin (i.e.&dlocal).thisisimplementedmostecientlywithpointerstotheparentnodein maximumcanbemadeonthepathfromtheroot(i.e.&dexp)tothecurrentelement onlyancestorsarepossiblereachingdenitions(propertyvpa),thecomputationofthe dependence).aneectiveimplementationofstatementrisgiveninfigure5.37.the
5.5.PARALLELIZATIONOFRECURSIVEPROGRAMS maxatloc!=nulltestisnecessaryingeneral,when?canbeapossiblereachingdenition,butitcouldindeedberemovedinourcasesinceexecutionofancestorsisguaranteed. TheappropriateconstructionoftheparenteldinControlTypeisassumedintherest 235 r{controltype<int>*maxloc=dlocal; ofthecode.... ControlType<int*>*maxatloc=Dlocal; while(maxatloc!=null&&maxatloc->value!=&(a[j])){ maxloc=maxloc->parent; }}=maxloc->value; maxatloc=maxatloc->parent; programqueens,seefigure5.38.anadditionaloptimizationhasbeenperformed:only...figure5.37.implementationofthereadreferenceinstatementr... beobtainedthankstostaticanalysesofvariables[ch78].parallelizationoftheprivatized thekrstelementsofarrayaarecopied,becausetheothersarenotused.thisresultcan Wealsoexperimentedtheprivatizationtechniquesincepropertyvpaissatisedfor formisstudiedinsection5.5.6. Westartwithtwomotivatingexamplestoshowwhatwewanttoachieve,thendiscussthe resultsofclassicalstaticanalysesonsuchexamples,beforewepresentourstatementwise StatementwiseParallelization parallelizationalgorithm. OurrstexampleistheBSTprogramintroducedinSection2.3.Instancewisedependence MotivatingExample analysishasbeenperformedinsection4.4andtheresultistherationaltransducerin distinctedges,andbecausetheunderlyingdatastructureisatree,weknowthatall secondone.bothconditionalstatementsi1andj1canthusbeexecutedasynchronously Figure4.9.Becausethetworecursivecallsinvolvedereferencesofpointerpalongtwo (recallthatanimplicitsynchronizationissupposedatthereturnpointofprocedurebst, seesection1.2).theparallelversionisgivenbyfigure5.39. accessesperformedaftertherstcallareindependentfromaccessesperformedafterthe parallelexecutionofsandt,andtheirrespectivefunctioncallstoevenandodd. programisthattherearenodependencesbetweeninstancesofsandt.thisallows ontheoddones,seeprogrammapinfigure5.40.theresultofouranalysisforthis Oursecondexamplemapstwofunctionsonalist,oneonevenelementsandtheother resultsonthesetwomotivatingexamples.hendrenetal.proposein[hhn94]adependencetestforrecursiveprogramswithpointer-baseddatastructures.theirtechnique doesnothandlearrays(seenaspointerarithmeticsinthatcase).butsinceithandles Letuscomparetheeectivenessofrelatedparallelizationtechniqueswiththeexpected
236... CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION PvoidQueens(intA[n],intn,intk){ inta[n]; intb[n]; B=b A=a I memcpy(b,a,k*sizeof(int)); r if(k<n){ for(inti=0;i<n;i++){ J for(intj=0;j<k;j++){ s if(){ =B[j]; Q B[k]=; }}} Queens(B,n,k+1); F}intmain(){ } Queens(A,n,0); awiderangeofrecursivedatastructures,includingdirectedacyclicgraphsanddoublylinkedlists,itismoregeneralthanourtechniqueinthatdomain.becausetheirpointer...figure5.38.privatizationofprogramqueens... procedureisnot,sincetheirpathexpressionscannotcapturetheevennessofdereference theedgenames,thebstprogramisactuallyparallelizedwiththeirtechnique.butthemap aliasingabstractionisbasedonpathexpressionswhicharepairsofregularexpressionson numbers.theveryprecisealiasanalysisbydeutsch[deu94]wouldallowparallelizationof thetwoexamplesbecausekleenestarsaretherereplacedbynamedcountersconstrained withsystemsofaneequations.moreusualow-sensitiveandcontext-sensitivealias analyses[lrz93,egh94,ste96]wouldgenerallysucceedforbstandfailformap. Algorithm arestatementsinsteadofprogrampoints,andwhoseedgesareprogrampointsinstead graph[asu86]oftheprogram i.e.thedualgraphofthecontrolowgraph whosenodes Wenowpresentanalgorithmforstatementwiseparallelizationofrecursiveprograms, ofstatements.wedeneasynchronizationgraph(ctrl;e0)asasub-graphof(ctrl;e) basedontheresultsofourdependenceanalysis.let(ctrl;e)bethedualcontrolow suchthateveryedgeine0isassociatedwithasynchronizationbarrier.supposingthat chronizationgraphmustensurethatthereareenoughsynchronizationpointstopreserve allsequentialcompositionsofstatementsarereplacedbyasynchronousexecutions,asyn- theoriginalprogramsemantics.thankstobernstein'sconditions,thisisensuredbythe followingcondition:lets;t2ctrlbetwoprogramstatements,st2e,andbbethe
5.5.PARALLELIZATIONOFRECURSIVEPROGRAMS... 237 LI2 PvoidBST(tree*p){ ai1 spawnif(p->l!=null){ if(p->value<p->l->value){ BST(p->l); cb t=p->value; }} p->l->value=t; p->value=p->l->value; RJ2 dj1 spawnif(p->r!=null){ if(p->value>p->r->value){ BST(p->r); fe t=p->value; }} p->r->value=t; p->value=p->r->value; F}intmain(){ } if(root!=null)bst(root);......figure5.39.parallelizationofprogrambst... st voidmap(list*p,list*q){ p->value=even(p->value); }intmain(){ if(){ q->value=odd(q->value); Map(p->next->next,q->next->next); } Map(list,list->next); innermostblocksurroundingbothsandt,...figure5.40.secondmotivatingexample:programmap... Indeed,executinguBxSanduByTinparallelinducesparallelexecutionofalltheir ST2E0()9v;w2Lctrl;u;x0;y02ctrl;x;y2(ctrlnfBg): descendants coarsegrainparallelization andprexushouldbechosenaslongaspos- v=ubxsx0^w=ubyty0^vw_wv:(5.44)
sible,hencetherestrictionofxandytonon-blabels.algorithmstatementwise- 238 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION Parallelizationisbasedonthisequationtogenerateaparallelprogramwiththe requiredsynchronizations.itisinterestingtonoticethat lationcanbeusedinsteadofthedependenceonetodescribestatementsthatmay whichmeansthatintersectionwiththelexicographicorderisnotnecessary:conictre- vw_wv()vw^(v2w_w2w); executeinparallel.because(ctrlb(ctrlnfbg)sctrlctrlb(ctrlnfbg)tctrl) cases:theconictrelationisapproximateonlyformulti-dimensionalarrays.noticethat nizationgraphforarecursiveprogramcanbedonewithoutanyapproximationinmost dependcanbecomputedexactly.thesetworemarksshowthatcomputingthesynchro- instatementwise-parallelizationisarecognizablelanguage,itsintersectionwith issueisleftforfuturework. thisalgorithmdoesnotperformanystatementreorderinginsideaprogramblock;this Statementwise-Parallelization(program;) 1depend program:anintermediaterepresentationoftheprogram 2(ctrl;edges) :theconictrelationtobesatisedbyallparallelexecutionorders returnsaparallelimplementationofprogram \((WR)[(RW)[(WW)) 54doB 3foreachSTinedges innermostblocksurroundingbothsandt depend\(ctrlb(ctrlnfbg)sctrl dualcontrolowgraphofprogram 9867 insertaspawnkeywordbeforeeverystatement ifsynchro6=? theninsertasyncstatementatprogrampointassociatedwithst ctrlb(ctrlnfbg)tctrl) 10returnprogram parallelism.whenaspawnedstatementisimmediatelyfollowedbyasync,bothkeywords chronousexecutionatthestatementlevel,andseveralenvironmentsdonotsupportnested lelprogrammingenvironment:cilkonlyallowsasynchronousprocedurecalls,notasyn- Ofcourse,severalspawnkeywordsmaybeuselessormisplacedregardingtheparal- Parallelizationonthetwomotivatingexamplesyieldstheexpectedresults. andshrinkthecritical-path,seeforexample[rin97].applicationofstatementwise- canberemovedsincesuchaconstructisequivalenttosequentialexecution.inaddition, powerfulmethodshavebeencraftedtooptimizethenumberofsynchronizationpoints pendencetest(butautomaticcomputationofstoragemappingsisnothandledin[fea98]). asimilarresultonbothmotivatingexamples,sincetheyarebasedonaninstancewisede- Eventually,theparallelizationtechniqueproposedbyFeautrierin[Fea98]wouldnd StatementwiseParallelizationviaMemoryExpansion OurrunningexampleisnowprogramQueens,alreadystudiedinthepreviouschapters. Figure4.15,andtheprivatizedQueensprogramproposedinSection5.5.5,seeFigure5.38. reachingdenitioninformationcomputedinsection4.5,i.e.theone-countertransducerin Thisprogramdoesnotholdanyparallelloop(theinner-looplooksparallelbutmemory dependencesonthe\"partsactuallyhampersparallelization).wewillconsiderthe
5.5.PARALLELIZATIONOFRECURSIVEPROGRAMS RecallthatreachingdenitionrelationofprogramQueenssatisedthevpaproperty: 239 isthattherecursivecallcanbemadeasynchronous,seefigure5.41.startingfromthe single-assignmentformversionofprogramqueens(seefigure5.36),nomoreparallelism todecidewhetheraprocedurecallcanbeexecutedasynchronouslyornot.theresult thisguaranteesthatthereachingdenitionrelationcanbeusedasdependenceinformation wouldhavebeenextractedbuttheoverheadduetofunctioncomputationwouldmake theparallelprogramunpractical.... PvoidQueens(intA[n],intn,intk){ inta[n]; intb[n]; B=b IA=amemcpy(B,A,k*sizeof(int)); if(k<n){ for(inti=0;i<n;i++){ Jr for(intj=0;j<k;j++){ s if(){ =B[j]; Q B[k]=; }}} spawnqueens(b,n,k+1); F}intmain(){ } Queens(A,n,0);...Figure5.41.ParallelizationofprogramQueensviaprivatization... Parallelizationusingthereaching-denitionrelationasdependencerelationfortheexpandedprogram.However,ifprivatizationischosen,onlyasynchronouscallstoprivatized Thealgorithmtoachievethisresultautomaticallyissimple.Firstchoosebetween single-assignmentformandprivatization;second,applyalgorithmstatementwisebecausesomememory-baseddependencesbetweeninstancesofnon-procedurestatements mayremain. asynchronousandparallelconstructsshouldberemovedfromthegeneratedcode;thisis proceduresareprovablycorrect(theypreservetheoriginalprogramsemantics),allother andtothespeed-upoftheparallelversioncomparedtothesequentialnon-privatized processorsgiorigin2000.theresultsinfigure5.42correspondstotheexecutiontime one(withoutcilkoverheadandwithoutarraycopying).theprogramwasrunwith SomeexperimentshavebeenperformedwiththeCilkenvironment[MF98]ona32 overheadinducedbytheexpansionofprogramqueens.performanceisverygoodupto 16processors,thenitdegradesfor32processors. 13queensonly,todemonstratesboththeeciencyoftheCilkrun-timeandthelow
240... CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION 25 20 8 15 4...Figure5.42.Parallelresolutionofthen-Queensproblem... 10 2 5 1 0 0.5 (seenasaparticularimplementationofarrays)usedbyruginaandrinardin[rr99] withotherparallelizationtechniques.ithappensthatanalysesforpointerarithmetics NoticethattheprivatizedQueensprogramcanitselfbethematterofacomparison 1 2 4 8 16 32 1 2 4 8 16 32 Processors Processors expressionanalysiscomputesax-pointoverrecursivecallstoprocedurequeenswhich executedasynchronouslywiththenextiterations.however,theinter-proceduralregion k,whichmeansthatforagiveniterationoftheouterloop,theprocedurecallcanbe areunabletoparallelizetheprogram.indeed,theorderinganalysisshowsthatj< practice. cannotcapturethefactthatonlythekrstelementsofarrayaareuseful:subsequent recursivecallsarethussupposedtoreadthewholearraya,whichisnotthecasein stancelevel.thiscommontechniqueforloopnestparallelizationiscompletelynewfor Thislastsectioninvestigatesparallelizationofrecursiveprogramsatthestatementin- 5.5.7 InstancewiseParallelization recursiveprograms.noticewedonotproposearun-timeparallelizationtechniquefor recursiveprograms:wedescribeatcompile-timethesetsofrun-timeinstanceswhichcan beexecutedasynchronously. identicalintherstcall:theyaresettotherootofabinarytreestructure. MotivatingExample instancewiseornot wouldreturnthesameresult:noparallelismcanbefoundinthis WestudytheprocedurePexampleinFigure5.43.a.Pointerargumentspandqare program.however,amorepreciseobservationshowsthatwhenthecurrentcontrolword wcontainsbothaandborbothcandd,pandqmayneverbealiasedagaininall Becausepandqmaybealiasedduringthewholeexecution,anydependencetest abstractparallelizationofprocedurepinfigure5.43.b(recallthatcurinsstandsforthe descendantsofw(wordssuchaswisastrictprex).thisprovesthecorrectnessofthe ahugeamountofparallelism anaveragelogarithmicparallelcomplexity. havebeentaken,allrecursivecallscanbeexecutedasynchronously.thisyieldsinpractice run-timevalueofthecontrolword).assoonasbothbranchesofthesameconditional techniqueforrecursiveprograms.ofcourse,suchatechniquerequiresmoreinformation Eventually,thismotivatingexampleshowstheneedforaninstancewiseparallelization Time (s) Sequential 13-Queens Speed-up (parallel / original) 32 16 Optimal 13-Queens
5.5.PARALLELIZATIONOFRECURSIVEPROGRAMS... 241 atpvoidp(int*p,int*q){ bs if()p(p->l,q); q->v=; elsep(p,q->r); p->v=; cd if()p(p->r,q); atspvoidp(int*p,int*q){ }intmain(){ elsep(p,q->l); b q->v=; p->v=; c if()spawnp(p->l,q); d elsespawnp(p,q->r); if(curins2(a+d)+(b+c))sync if()spawnp(p->r,q); elsespawnp(p,q->l); Figure5.43.a.ProcedureP F} P(tree,tree); F}intmain(){ Figure5.43.b.AbstractparallelizationofP } P(tree,tree); thanasimpledependencetest:aprecisedescriptionoftheinstancesindependenceisthe...figure5.43.instancewiseparallelizationexample... Algorithm keyforinstancewiseparallelismdetection. programs,andtogeneratetheparallelcode.thistechniquenaturallyextendstheprevious Wenowpresentanalgorithmtoautomaticallydetectinstancewiseparallelisminrecursive words.theideaconsistsinguardingeverysyncstatementwiththedomainofrelation synchroinstatementwise-parallelization.inthecaseofalgebraicrelations,this ofthecurrentrun-timeinstancetorationalsubsetsoflctrl thewholelanguageofcontrol statementwisealgorithm,butsynchronizationstatementsarenowguardedbymembership computearationalapproximationofthedomainbeforegeneratingthecode. domainisanalgebraiclanguageandmembershipmaynotbedecidedeciently,wethen foronlinecomputationofthecurins2setcondition.thisfunctionisusuallyimplementedwithatwo-dimensionalarray,seetheexamplebelow.26 InstancewiseparallelizationalgorithmInstancewise-Parallelizationisbasedon thestatementwiseversion,anditgeneratesa\nextstate"functionalpha:qctrl!q usedforonlinerecognitionof(a+d)+(b+c)isgiveninfigure5.44.b.transitionsare butthesynchronizationconditionisnowfullyimplemented:thedeterministicautomata Figure5.44.ItisbasicallythesameparallelizationastheabstractcodeinFigure5.43.a, TheresultofInstancewise-ParallelizationappliedtoprocedurePisshownin statementlabels. storedinarraynext,therstdimensionisindexedbystatenumbersandthesecondby compile-timewhichinstancesofprocedurepallowasynchronousexecutionoftherecursive onthisexample,becauseitisadependencetestonly:itcannotbeusedtocomputeat 26Anextensiontodeterministicalgebraiclanguageswouldberathereasytodesign,andwouldsometimesgivebetterresultsforrecursiveprogramswitharrays.Nevertheless,itrequirescomputationof NoticetheparallelizationtechniqueproposedbyFeautrierin[Fea98]wouldalsofail approximation. adeterministicapproximationofanalgebraiclanguage,whichismuchmoredicultthanarational
Instancewise-Parallelization(program;) 242program:anintermediaterepresentationoftheprogram CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION 1depend 2(ctrl;edges) :theconictrelationtobesatisedbyallparallelexecutionorders returnsaparallelimplementationofprogram \((WR)[(RW)[(WW)) 54doB 3foreachSTinedges synhro innermostblocksurroundingbothsandt depend\(ctrlb(ctrl fbg)sctrl dualcontrolowgraphofprogram 6897 ifset6=? thenifsetisalgebraic domainofrelationsynchro ctrlb(ctrl fbg)tctrl) 10 11 thenset rationalapproximationofset 14 13 12 foreachprocedureinprogram deneaglobalvariablestate=q0 computea\nextstate"functionfrom(q;fq0g;f;e) determinizationofset 16 17 18 15 foreachcalltoaprocedurepinprogram doinsertanewargument(state;p)intherstplace foreachnon-procedureblockbinprogram doinsertanewargumentstateintherstplace 21 22returnprogram 20 19 insertaspawnkeywordbeforeeverystatement insert\if(state2f)sync"atprogrampointassociatedwithst dodenealocalvariablestate=(state;b) calls. pansion.expandingdatastructuresisaclassicaloptimizationtocutmemory-based 5.6 Inthischapter,westudiedautomaticparallelizationtechniquesbasedonmemoryex- Conclusion location.thesecondproblemisthatconvertingprogramstosingle-assignmentformis location,inthegeneratedcode.whencontrolanddataowcannotbeknownatcompiletime,run-timecomputationshavetobedonetondtheidentityofthecorrectmemory dependences.therstproblemistoensurethatallreadsrefertothecorrectmemory denitioninformation,arobustrun-timedata-owrestorationscheme,andaversatile toocostly,intermsofmemoryusage. lems.weproposedageneralmethodforstaticexpansionbasedoninstancewisereaching storagemappingoptimizationtechnique.ourtechniquesareeithernovelorgeneralize Whendealingwithunrestrictednestsofloopsandarrays,wehavetackledbothprob- previousworktounrestrictednestsofloops.eventually,allthesetechniqueswerecombinedinasimultaneousexpansionandparallelizationframework,basedonexpansion constraints.manyalgorithmsweredesigned,fromsingle-assignmentconversiontoconstrainedstoragemappingoptimizationandecientdata-owrestoration.thiswork advocatesfortheuseofconstrainedexpansioninparallelizingcompilers.thegoalisnow todesignpragmaticconstraintsandtoproposearealbi-criteriaoptimizationalgorithm
5.6.CONCLUSION... intstate=0; 243 PvoidP(intstate,int*p,int*q){ sintnext[4,4]={{1,2,2,1},{1,3,3,1},{2,3,3,2},{3,3,3,3}}; at if() q->v=; p->v=; b else if(state==3)sync spawnp(next[state,1],p,q->r); spawnp(next[state,0],p->l,q); a;d dc else if() }intmain(){ spawnp(next[state,3],p,q->l); spawnp(next[state,2],p->r,q); 0 a;d1 b;c b;ca;b;c;d Figure5.44.a.Parallelcode F} P(state,tree,tree); 2 a;d 3 synchronizationatrun-time Figure5.44.b.Automatontodecide b;c forexpansionoverheadandparallelismextraction....figure5.44.automaticinstancewiseparallelizationofprocedurep... basedontherationalandalgebraictransductionresultsofouranalysisforrecursiveprograms.dicultproblemsrelatedwithonlinecomputationofreachingdenitionsand run-timedata-owrestorationwhereinvestigated.extendingconstrainedexpansionand unresolvedissuesforsimplerexpansionschemesmustbeinvestigatedrst.eventually, storagemappingoptimizationtorecursiveprogramsisleftforfuturework,butseveral couldbeusedtoextractcontrolparallelism.asimplealgorithmtodecidewhethertwo incombinationwiththeprivatizationtechnique.thisalgorithmachievesbetterresults statementscanbeexecutedinparallelhasbeendesignedandappliedtoanexample weshowedthattherationaloralgebraictransductionsreturnedbydependenceanalysis investigatedmemoryexpansionofrecursiveprograms,whichisanewissueinautomatic parallelization.single-assignmentandprivatizationwereextendedtorecursiveprograms, Thesecondpartofthischapterdiscussedparallelizationofrecursiveprograms.We thanmostexistingtechniques,becauseitisbasedonaveryprecise andinstancewise dependenceinformation.thesegoodresultsmotivatefurtherresearchesindependence pletelynewforrecursiveprograms.however,algorithmsproposedarestillratherprim- itive:theyneitherperformstatementreorderingnorintegratearchitectureparameters studiedinmoreclassicalparallelizationframeworksandwehopethatthesamesolutions parallelization:itdecidesatcompile-timewhethertwoinstancesofastatementcanbe executedinparallelornot.commoninthecaseofnestedloops,thistechniqueiscom- analysisofrecursiveprograms.anothercontributionisthealgorithmforinstancewise suchastheminimalgrainofparalleltasks.fortunately,theseissueshavebeenwidely
wouldapplytoourownframework. 244 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION study boththeoreticallyandexperimentally theeectoffunctionsonparallelcode performance.second,studyhowcomprehensiveparallelizationtechniquescanbeplugged intotheconstrainedstoragemappingoptimizationframework:reducingmemoryusageis Futureworkisthreefold.First,improveoptimizationofthegeneratedcodeand anextensivestudyoftheapplicabilityofmemoryexpansiontechniquesforparallelization ofrecursiveprograms. agoodthing,butchoosingtherightparallelexecutionorderisanother.third,proceedin
245 Chapter6 Conclusion byadiscussionofperspectivesandfutureworks. Wenowconcludethisthesisbyasummaryofthemainresultsandcontributions,followed addressautomaticparallelizationandaresummarizedinthenexttable,andthefourth 6.1 Ourmaincontributionscanbedividedintofourcloselyrelatedparts.Therstthreeparts Contributions oneisaboutrationalandalgebraictransductions.notallcontributionsinthistableare toalargerclassofprograms. wellmaturedandreadytouseresults:mostoftheworkaboutrecursiveprogramsshould beseenasarstattempttoextendinstancewiseanalysisandtransformationtechniques Instancewise Affineloopnests [Bra88,Ban88] witharrays Unrestrictedloopnests [BCF97,Bar98] witharrays witharraysandtrees [Fea98],1Chapter4, Recursiveprograms Instancewisereaching[Fea88a,Fea91,Pug92] dependenceanalysis[fea88a,fea91,pug92] definitionanalysis [CBF95,BCF97,Bar98] [WP95,Won95] publishedin[cc98]2 Single-assignment [Fea88a,Fea91] [MAL93] [WP95,Won95] [Col98], publishedin[cc98]2 Section5.5 Chapter4, Maximalstatic form Sections5.2and5.4, Sections5.1and5.4 Storagemapping expansion [LF98,Lef98] publishedin[bcc98,coh99b,bcc00] Sections5.3and5.4, openproblem Instancewise optimization [SCFS98,CDRV97] [Fea92,CFH95] publishedin[cl99,coh99b] [GC95,CBF95] openproblem Letusnowrevieweverycontributioninmoredetail. parallelization [DV97] [Col95b] Section5.5 1Dependencetestfortreesonly. 2Forarraysonly.
ControlandDataStructures:BeyondthePolyhedralModelInChapter2, 246 CHAPTER6.CONCLUSION wedenedaprogrammodelandmathematicalabstractionsforstatementinstancesand formalpresentationofourtechniques,especiallywhendealingwithrecursivecontroland elementsofdatastructures.thisframeworkwasusedthroughoutthisworktogivea nalandalgebraictransductions.usinganewdenitionofinductionvariablesinrecursive programs,wecouldcapturetheeectofeveryrun-timeinstanceofastatementinarationaloralgebraictransduction.becauseconditionalsandloopboundsareunrestricted,we couldachieveonlyapproximateresultsingeneral.asummaryofprogrammodelrestrictionsandacomparisonwithotherdependenceandreachingdenitionanalysesconcludes wereproposedinchapter4,basedonformallanguagetheory,andmorepreciselyonratio- Novelinstancewisedependenceandreachingdenitionanalysesforrecursiveprograms benetofthewealthofalgorithmstoworkwithanerelationsinpresburgerarithmetics. theprogrammodel westickedtotheclassicaliterationvectorframework,andwetook thiswork. However,whendesigningalgorithmsfornestedloopsandarrays aspecialcaseof viamemoryexpansionisanoldtechnique,buttherecentextensionofinstancewisereachingdenitionanalysestoprogramswithconditionals,complexdatastructurereferences thesecondisthatexistingtechniquesformemoryexpansionhavetobeextendedtot thenewprogrammodels. e.g.non-anearraysubscripts orrecursivecallsraisesnewquestions.therstoneisto MemoryExpansion:NewTechniquestoSolveNewProblemsParallelization ensurethatreadaccessesintheexpandedprogramrefertothecorrectmemorylocation; hasbeenextendedtounrestrictedloopnests.combinationofthetwotechniqueshasalso unrestrictednestedloopsandarrays.anewtechniquetoreducetherun-timeoverhead ofmemoryexpansionhasbeenproposed,andanothertechniquetoreducememoryusage WeaddressedbothquestionsintherstfoursectionsofChapter5,whendealingwith oftheowofdata(whenitismandatory).wealsodiscussedexperimentalresultsona beenstudied.eventually,wedesignedseveralalgorithmstooptimizerun-timerestoration shared-memoryarchitecture. caseswecoulddesignalgorithmstogeneratelow-overheadexpandedrecursiveprograms. eredthatthemathematicalabstractionforreachingdenitions rationalandalgebraic transductions mayincurasevererun-timeoverhead.nevertheless,inafewparticular Memoryexpansionforrecursiveprogramsisacompletelynewtopic,andwediscov- Parallelism:ExtendingClassicalTechniquesOurnewdependenceanalysistechniquehasbeenshownusefultoparallelizingrecursiveprograms.Itdemonstratesthe applicabilityofrationalandalgebraictransductions,thankstotheirdecidableproperties. ofrecursiveprograms:thisnewtechniqueismadepossiblebytheinstancewiseinformationcapturedinrationalandalgebraictransductions.afewexperimentalresultswere achievebetterresultsingeneral.anotheralgorithmaddressesinstancewiseparallelization discussed,combiningexpansionandparallelizationonawellknownrecursiveprogram. resultsofthisworkdonotbelongtocompilation.theyaremostlyfoundinthethird Therstalgorithmwepresentedissimilartoexistingparallelizationmethodsforrecursive programs,butittakesbenetoftheadditionalinformationcapturedbyouranalysisto FormalLanguageTheory:SeveralContributionsandApplicationsThelast
6.2.PERSPECTIVES sectionofchapter3 presentingusefulmathematicalabstractions andsomeinthefollowingsections.wedesignedasub-classofrationaltransductionswithbooleanalgebra structureandmanyotherinterestingproperties.weshowedthatthisclassisnotde- 247 monoidsandinvestigatedapproximationofalgebraictransductions. presentedsomenewresultsaboutcompositionofrationaltransductionsovernon-free cidableamongrationaltransductions,butconservativeapproximationtechniquesallow totakebenetofthesepropertiesinthewholeclassofrationaltransductions.wealso 6.2 Manyquestionsarosealongthisthesis,andourresultsmotivatemoreinterestingstudies thanitsolvesproblems.westartwithquestionsrelatedwithrecursiveprograms,then Perspectives discussfutureworkinthepolyhedralmodel. applications.reachingdenitionanalysishasmostsueredoftheselimitations,aswell havebeensuccessfulinmanycases,butthelackofexpressivenesshasoftenlimitedtheir propertiesappearedoncemoreasacriticalissue.rationalandalgebraictransductions Firstofall,lookingforthegoodmathematicalabstractiontocaptureinstancewise asintegrationofconditionalexpressionsandloopboundsindependenceanalysis.inthis context,wewouldliketoconsidermorethanonecounterinatransducer,andstillbeable todecideemptinessandotherusefulproperties.wearethusveryinterestedinthework bycomonandjurski[cj98]ondecidingtheemptinessforasub-classofmulti-counter [CBF95]:insertingnewparameterstocapturepropertiesofnon-aneexpressionsand classesofminskymachines,suchastimedautomata.inaddition,usingseveralcounters wouldallowustoextendoneofthemajorideasunderlyingfuzzyarraydataowanalysis languages,andmoregenerallyinstudiesaboutsystemvericationbasedonrestricted Inparticular,wediscoveredwhenstudyingdeterministicandleft-synchronousrelations importantthingforprogramanalysis:afewgoodapproximateresultsareoftensucient. improveprecision. thatanicesub-classwithgooddecidabilitypropertiescannotbeusedinourframework Moreover,webelievethatdecidabilityofthemathematicalabstractionisnotthemost withoutanecientapproximationmethod.improvingourtechniquestoresynchronize rationaltransducersandapproximatethembyleft-synchronousonesisthusanimportant issue.wealsohopethatthisdemonstratesthehighmutualinterestofcooperations foragracefuldegradationofourresultsusingapproximationtechniques.thisideahas aspossibleintheprogrammodel.ashintedbefore,thebestwayconsistsinlooking betweentheoreticalcomputerscientistsandcompilationresearchers. beeninvestigatedinasimilarcontext[cbf95],andstudyingitsapplicabilitytorecursive Besidestheseformalaspects,anotherresearchissueistoalleviateasmanyrestrictions variablecomputationonexecutiontraces(insteadofcontrolwords) allowinginduction variableupdateineveryprogramstatement thentodeduceapproximateinformation programsisaninterestingfuturework.anotherideawouldbetoperforminduction oncontrolwords;relyingonabstractinterpretationtechniques[cc77]wouldperhapsbe thehighoverheadtocomputereachingdenitionsatrun-time eitherexactlyorwith helpfulinprovingthecorrectnessofourapproximations. localtoeachprocedure seemmorepromising,butrequirefurtherstudy.workingon functions.pragmatictechniquessimilartoprivatization i.e.makingaglobalvariable Theinterestofmemoryexpansionforrecursiveprogramsisstillunclear,becauseof anextensionofmaximalstaticexpansionandstoragemappingoptimizationtorecursive
programsisperhapstooearlyinthiscontext,buttransitiveclosure,classenumeration 248 CHAPTER6.CONCLUSION openproblems. andgraphcoloringtechniquesforrationalandalgebraictransductionsareinteresting arationaltransducerfromdatestoinstancesisperhapsagoodidea,buttheproblemof generatingthecodetoenumeratetheprecisesetsofinstancesbecomesratherdicult. waytoassignsetsofrun-timeinstancestologicalexecutiondatesisunknown.building Wehavenotaddressedtheproblemofschedulingrecursiveprograms,becausethe exploitedbycontrolparalleltechniques,andtheneedforadataparallelexecutionmodel Besidesthesetechnicalreasons,mostparallelisminrecursiveprogramscanalreadybeen isnotobvious. fromthepolyhedralmodelcoveranimportantpartofthisthesis.anmajorgoalthroughouthisworkwastokeepsomedistancewiththemathematicalrepresentationofane relations.onedrawbackofthispointofviewistheincreaseddicultytobuildoptimized Inadditiontomotivatingalargepartofourworkonrecursiveprograms,techniques algorithmsreadytobeusedinacompiler,butthebigadvantageisthegeneralityofthe approach.amongthetechnicalproblemsthatshouldbeimprovedinbothmaximalstatic expansionandstoragemappingoptimization,themostimportantarethefollowing. practicalexperiencewithparallelizationofloopnestswithunpredictablecontrolowand mainlyusedasanintermediaterepresentation,functionsarerarelyimplementedin non-anearraysubscriptsisstillverylow.becausethessaframework[cfr+91]is Manyalgorithmsforrun-timerestorationofthedataowhavebeendesigned,but practice.generatinganecientdata-owrestorationcodeisthusarathernewproblem. mustbedone.themainideaswouldbecodepartitioning[ber93]andextendingourtechniquestohierarchicaldependencegraphs,arrayregions[cre96]orhierarchicalschedules alargescaleexperimenthasneverbeenperformed.toapplypreciseanalysisandtransformationtechniquestorealprograms,animportantworkinoptimizingthetechniques Noparallelizingcompilerforunrestrictednestedloopshasbeendesigned.Asaresult, copy-out,schedulelatency,memoryhierarchy,memoryusage,placementofcomputations [CW99]. andcommunications...andwehaveseenthattheoptimizationproblemisevenmore rameters:run-timeoverhead,parallelismextraction,parallelizationgrain,copy-inand Aparallelizingcompilermustbeabletotuneautomaticallyalargenumberofpa- arststep. neousoptimizationofsomeparametersrelatedwithmemoryexpansion,butthisisonly complexfornon-aneloopnests.ourconstrainedexpansionframeworkallowssimulta-
249 Bibliography [AB88] [AFL95] J.-M.AutebertandL.Boasson.Transductionsrationnelles.Masson,Paris, France,1988. 185,LaJolla,California,USA,June1995. ProgrammingLanguageDesignandImplementation(PLDI'95),pages174{ A.Aiken,M.Fahndrich,andR.Levien.Betterstaticmemorymanagement: Improvingregion-basedanalysisofhigher-orderlanguages.InACMSymp.on [AI91] 39{50,June1991. Symp.onPrinciplesandPracticeofParallelProgramming(PPoPP'91),pages C.AncourtandF.Irigoin.ScanningpolyhedrawithDOloop.In3rdACM [AK87] J.AllenandK.Kennedy.AutomatictranslationofFortranprogramstovector [Ala94] form.acmtrans.onprogramminglanguagesandsystems,9(4):491{542, October1987. 1994. turesdedonneesirregulieres.phdthesis,universitebordeauxi,september M.Alabau.Uneexpressiondesalgorithmesmassivementparallelesastruc- [AR94] [Amm92]Z.Ammarguellat.Acontrol-ownormalizationalgorithmanditscomplexity. R.AndonovandS.Rajopadhye.Asparseknapsackalgo-tech-cuitandits synthesis.inint.conf.onapplication-specicarrayprocessors(asap'94), IEEETrans.onSoftwareEngineering,18(3):237{251,March1992. [ASU86] SocietyPress. pages302{313,san-francisco,california,usa,august1994.ieeecomputer [Bak77] Tools.Addison-Wesley,1986. A.Aho,R.Sethi,andJ.Ullman.Compilers:Principles,Techniquesand [Ban88] B.S.Baker.Analgorithmforstructuringprograms.JournaloftheACM, 24:98{120,1977. [Ban92] U.Banerjee.LoopTransformationsforRestructuringCompilers:TheFoundations.KluwerAcademicPublishers,Boston,USA,1992. Publishers,Boston,USA,1988. U.Banerjee.DependenceAnalysisforSupercomputing.KluwerAcademic [Bar98] PhDthesis,UniversitedeVersailles,France,February1998. D.Barthou.ArrayDataowAnalysisinPresenceofNon-aneConstraints. http://www.prism.uvsq.fr/~bad/these.html.
[BBA98] 250 H.Bourzou,B.SidiBoulenouar,andR.Andonov.Atilingapproachfor BIBLIOGRAPHY [BC99a] solvingdynamicprogrammingknapsackproblemrecurrences.inrencontres IGM99-06,InstitutGaspardMonge,UniversitedeMarne-la-Vallee,France, M.P.BealandO.Carton.Asynchronousslidingblockmaps.TechnicalReport francophonesduparallelisme(renpar'10),strasbourg,france,june1998. [BC99b] M.-P.BealandO.Carton.Determinizationoftransducersoverniteandinnitewords.TechnicalReport(toappear),InstitutGaspardMonge,Universite 1999. [BCC98] D.Barthou,A.Cohen,andJ.-F.Collard.Maximalstaticexpansion.In demarne-la-vallee,france,1999. [BCC00] 25thACMSymp.onPrinciplesofProgrammingLanguages,pages98{106,San D.Barthou,A.Cohen,andJ.-F.Collard.Maximalstaticexpansion.Int. Diego,California,USA,January1998. [BCF97] D.Barthou,J.-F.Collard,andP.Feautrier.Fuzzyarraydataowanalysis. JournalofParallelandDistributedComputing,40:210{226,1997. JournalofParallelProgramming,June2000.Toappear. [BDRR94]P.Boulet,A.Darte,T.Risset,andY.Robert.(Pen)-ultimatetiling?In [BE95] W.BlumeandR.Eigenmann.Symbolicrangepropagation.InProc.ofthe nessee,usa,may1994.ieeecomputersocietypress. ScalableHigh-PerformanceComputingConf.,pages568{576,Knoxville,Ten- [BEF+96]W.Blume,R.Eigenmann,K.Faigin,J.Grout,J.Hoeinger,D.Padua,P.Petersen,W.Pottenger,L.Rauchwerger,P.Tu,andS.Weatherford.Parallel California,USA,April1995.IEEEComputerSocietyPress. 9thInt.ParallelProcessingSymp.(IPPS'95),pages357{363,SantaBarbara, [Ber79] J.Berstel.TransductionsandContext-FreeLanguages.Teubner,Stuttgart, Germany,1979. programmingwithpolaris.ieeecomputer,29(12):78{82,december1996. [Ber93] J.-Y.Berthou.Contructiond'unparalleliseurdelogicielsscientiquesde [BH77] grandetailleguideepardesmesuresdeperformances.phdthesis,universitepierreetmariecurie(parisvi),france,october1993. M.BlattnerandT.Head.Singlevalueda-transducers.JournalofComput. [Bra88] tion.inacmint.conf.onsupercomputing,pages407{417,st.malo,france, T.Brandes.Theimportanceofdirectdependencesforautomaticparalleliza- andsystemsci.,15:310{327,1977. [CBC93] J.-D.Choi,M.Burke,andP.Carlini.Ecientow-sensitiveinterprocedural July1988. PrinciplesofProgrammingLanguages(PoPL'93),pages232{245,Charleston, SouthCarolina,USA,January1993. computationofpointer-inducedaliasesandsideeects.in20thacmsymp.on
BIBLIOGRAPHY [CBF95] J.-F.Collard,D.Barthou,andP.Feautrier.Fuzzyarraydataowanalysis. 251 [CC77] 92{102,SantaBarbara,California,USA,July1995. P.CousotandR.Cousot.Abstractinterpretation:auniedlatticemodelfor InACMSymp.onPrinciplesandPracticeofParallelProgramming,pages staticanalysisofprogramsbyconstructionofapproximationofxpoints.in [CC98] 4thACMSymp.onPrinciplesofProgrammingLanguages,pages238{252,Los Angeles,California,USA,January1977. A.CohenandJ.-F.Collard.Instance-wisereachingdenitionanalysisfor recursiveprogramsusingcontext-freetransductions.inparallelarchitectures [CCG96] andcompilationtechniques,pages332{340,paris,france,october1998. A.Cohen,J.-F.Collard,andM.Griebl.Data-owanalysisofrecursivestructures.InProc.ofthe6thWorkshoponCompilersforParallelComputers, pages181{192,aachen,germany,december1996. IEEEComputerSocietyPress.(IEEEawardforthebeststudentpaper). [CDRV97]P.-Y.Calland,A.Darte,Y.Robert,andFredericVivien.Pluggingantiand [CFH95] outputdependenceremovaltechniquesintoloopparallelizationalgorithms. ParallelComputing,23(1{2):251{266,1997. ScienticComputing,February1995. lelismviahierarchicaltiling.insiamconferenceonparallelprocessingfor L.Carter,J.Ferrante,andS.FlynnHummel.Ecientmultiprocessorparal- [CFR+91]R.Cytron,J.Ferrante,B.K.Rosen,M.N.Wegman,andF.K.Zadeck.Ef- cientlycomputingstaticsingleassignmentformandthecontroldependence [CFR95] October1991. graph.acmtrans.onprogramminglanguagesandsystems,13(4):451{490, [CH78] P.CousotandN.Halbwachs.Automaticdiscoveryoflinearrestraintsamong systemsofaneconstraints.parallelprocessingletters,5(3),1995. J.-F.Collard,P.Feautrier,andT.Risset.ConstructionofDOloopsfrom [Cho77] Languages,pages84{96,January1978. variablesofaprogram.in5thacmsymp.onprinciplesofprogramming C.Chorut.Unecaracterisationdesfonctionssequentiellesetdesfonctions [CI96] Science,5:325{338,1977. sous-sequentiellesentantquerelationsrationnelles.theoreticalcomputer [CJ98] B.CreusilletandF.Irigoin.Interproceduralarrayregionanalyses.Int.JournalofParallelProgramming,24(6):513{546,December1996. AidedVerication,volume1427ofLNCS,pages268{279,Vancouver,Britich presburgerarithmetic.ina.huandm.vardi,editors,proc.computer Columbia,Canada,1998.Springer-Verlag. H.ComonandY.Jurski.Multiplecountersautomata,safetyanalysisand [CK98] France,1998. yses.technicalreport1998/22,laboratoireprism,universitedeversailles, J.-F.CollardandJ.Knoop.Acomparativestudyofreachingdenitionsanal-
[CL99] 252 A.CohenandV.Lefebvre.Optimizationofstoragemappingsforparallel BIBLIOGRAPHY [Cla96] P.Clauss.Countingsolutionstolinearandnonlinearconstraintsthrough France,September1999.Springer-Verlag. programs.ineuropar'99,number1685inlncs,pages375{382,toulouse, Ehrhartpolynomials:Applicationstoanalyzeandtransformscienticprograms.InACMInt.Conf.onSupercomputing,pages278{295.ACMPress, [Coh97] A.Cohen.Analysedeotdedonneesdeprogrammesrecursifsal'aidede 1996. studentpaper). Par'9),Lausanne,Suisse,May1997.(IEEEawardforthebestfrench-speaking grammaireshors-contexte.inrencontresfrancophonesduparallelisme(ren- [Coh99a] 1999. delangagesalgebriques.techniqueetscienceinformatiques,18(3):323{343, A.Cohen.Analysedeotdedonneespourprogrammesrecursifsal'aide [Coh99b]A.Cohen.Parallelizationviaconstrainedstoragemappingoptimization.In [Col94a] LNCS,pages83{94,Kyoto,Japan,May1999.Springer-Verlag. J.-F.Collard.Codegenerationinautomaticparallelizers.InC.Girault, Int.Symp.onHighPerformanceComputing(ISHPC'99),number1615in editor,proc.oftheint.conf.onapplicationsinparallelanddistributed [Col94b] J.-F.Collard.Space-timetransformationofwhile-loopsusingspeculative Computing,IFIPW.G.10.3,pages185{194,Caracas,Venezuela,April1994. NorthHolland. [Col95a] execution.inscalablehighperformancecomputingconf.,pages429{436, J.-F.Collard.Automaticparallelizationofwhile-loopsusingspeculativeexecution.Int.JournalofParallelProgramming,23(2):191{219,April1995. Knoxville,Tennessee,USA,May1994.IEEEComputerSocietyPress. [Col95b] J.-F.Collard.Parallelisationautomatiquedesprogrammesacontr^oledynamique.PhDthesis,UniversitePierreetMarieCurie(ParisVI),France, [Col98] J.-F.Collard.TheadvantagesofreachingdenitionanalysesinArray(S)SA. January1995. http://www.prism.uvsq.fr/~jfc/memoire.ps. In11thWorkshoponLanguagesandCompilersforParallelComputing,number1656inLNCS,pages338{352,ChapelHill,NorthCarolina,USA,August [Cou81] P.Cousot.Semanticfoundationsofprogramsanalysis.Prentice-Hall,1981. 1998.Springer-Verlag. [Cre96] B.Creusillet.ArrayRegionAnalysesandApplications.PhDthesis,Ecole [CW99] J.B.CropandD.K.Wilde.Schedulingstructuredsystems.InEuroPar'99, 1996. NationaleSuperieuredesMinesdeParis(ENSMP),Paris,France,December LNCS,pages409{412,Toulouse,France,September1999.Springer-Verlag.
BIBLIOGRAPHY [Deu90] A.Deutsch.Ondetermininglifetimeandaliasingofdynamicallyallocated 253 California,USA,January1990. ciplesofprogramminglanguages(popl'90),pages157{168,sanfrancisco, datainhigher-orderfunctionalspecications.in17thacmsymp.onprin- [Deu92] technique,france,april1992. tionsofrelationsonregularlanguageswithapplicationtothestaticdeter- minationofdynamicaliasingpropertiesofdata.phdthesis,ecolepoly- A.Deutsch.OperationalModelsofProgrammingLanguagesandRepresenta- [Deu94] A.Deutsch.Interproceduralmay-aliasanalysisforpointers:beyondk- [DGS93] E.Duesterwald,R.Gupta,andM.-L.Soa.Apracticaldataowframework limiting.inacmsymp.onprogramminglanguagedesignandimplementa- tion(pldi'94),pages230{241,orlando,florida,usa,june1994. forarrayreferenceanalysisanditsuseinoptimization.inacmsymp.on [DV97] Albuquerque,NewMexico,USA,jun1993. ProgrammingLanguageDesignandImplementation(PLDI'93),pages68{77, inpolyhedralreduceddependencegraphs.int.journalofparallelprogramming,25(6):447{496,december1997. M.Emami,R.Ghiya,andL.J.Hendren.Context-sensitiveinterprocedural points-toanalysisinthepresenceoffunctionpointers.inacmsymp.on A.DarteandF.Vivien.Optimalneandmediumgrainparallelismdetection [EGH94] [Eil74] 256,June1994. ProgrammingLanguageDesignandImplementation(PLDI'94),pages242{ [EM65] S.Eilenberg.Automata,LanguagesandMachines,volumeA.AcademicPress, 1974. [FB98] P.FeautrierandP.Boulet.Scanningpolyhedrawithoutdo-loops.InParallel tomata.ibmjournalofresearchanddevelopment,pages45{68,1965. C.C.ElgotandJ.E.Mezei.Onrelationsdenedbygeneralizedniteau- [Fea88a] ArchitecturesandCompilationTechniques(PACT'98),Paris,France,October 1998.IEEEComputerSocietyPress. [Fea88b] P.Feautrier.Parametricintegerprogramming.RAIRORechercheOperationnelle,22:243{268,September1988. 429{441,St.Malo,France,July1988. P.Feautrier.Arrayexpansion.InACMInt.Conf.onSupercomputing,pages [Fea91] ParallelProgramming,20(1):23{53,February1991. P.Feautrier.Dataowanalysisofscalarandarrayreferences.Int.Journalof [Fea92] P.Feautrier.Someecientsolutiontotheaneschedulingproblem,partII, [Fea98] multidimensionaltime.int.journalofparallelprogramming,21(6):389{420, EuroPar'98,LNCS,Southampton,UK,September1998.Springer-Verlag. December1992.SeealsoPartI,OneDimensionalTime,21(5):315{348. P.Feautrier.Aparallelizationframeworkforrecursivetreeprograms.In
[FM97] 254 P.FradetandD.LeMetayer.Shapetypes.In24thACMSymp.onPrinciples BIBLIOGRAPHY [FS93] C.FrougnyandJ.Sakarovitch.Synchronizedrelationsofnitewords.TheoreticalComputerScience,108:45{82,1993. 1997. ofprogramminglanguages(popl'97),pages27{39,paris,france,january [GC95] M.GrieblandJ.-F.Collard.Generationofsynchronouscodeforautomatic [GH95] parallelizationofwhileloops.ins.haridi,k.ali,andp.magnusson,editors, R.GhiyaandL.J.Hendren.Connectionanalysis:Apracticalinterproceduralheapanalysisforc.In8thWorkshoponLanguagesandCompilersfor EuroPar'95,volume966ofLNCS,pages315{326.Springer-Verlag,1995. 1995.Springer-Verlag. ParallelComputing,number1033inLNCS,Columbus,Ohio,USA,August [GH96] R.GhiyaandL.J.Hendren.Isitatree,adag,oracyclicgraph?Ashape ofprogramminglanguages(popl'96),pages1{15,st.petersburgbeach, Florida,USA,January1996. analysisforheap-directedpointersinc.in23rdacmsymp.onprinciples [Gup98] [GL97] R.Gupta.Acodemotionframeworkforglobalinstructionscheduling.InInt. M.GrieblandC.Lengauer.TheloopparallelizerLooPo announcement. ConfonCompilerConstruction(CC'98),pages219{233,1998. LNCS,1239:603{607,1997. [H+96] [Har89] M.Halletal.MaximizingmultiprocessorperformancewiththeSUIFcompiler.IEEEComputer,29(12):84{89,December1996. W.L.Harrison.Theinterproceduralanalysisandautomaticparallelisation [HBCM94]M.Hind,M.Burke,P.Carini,andS.Midki.Anempiricalstudyofprecise 1989. ofschemeprograms.lispandsymboliccomputation,2(3):176{396,october [HHN92] L.J.Hendren,J.Hummel,,andA.Nicolau.Abstractionsforrecursivepointer interproceduralarrayanalysis.scienticprogramming,3(3):255{271,1994. datastructures:improvingtheanalysisandtransformationofimperativeprograms.inacmsymp.onprogramminglanguagedesignandimplementation [HHN94] J.Hummel,L.J.Hendren,andA.Nicolau.Ageneraldatadependencetest (PLDI'92),pages249{260,SanFrancisco,Calfornia,USA,June1992. LanguageDesignandImplementation(PLDI'94),pages218{229,Orlando, Florida,USA,June1994. fordynamic,pointer-baseddatastructures.inacmsymp.onprogramming [HP96] M.HaghighatandC.Polychronopoulos.Symbolicanalysisforparallelizing 518,July1996. compilers.acmtrans.onprogramminglanguagesandsystems,18(4):477{
BIBLIOGRAPHY [HTZ+97]L.J.Hendren,X.Tang,Y.Zhu,S.Ghobrial,G.R.Gao,X.Xue,H.Cai, 255 [HU79] JournalofParallelProgramming,25(4):305{338,August1997. J.E.HopcroftandJ.D.Ullman.IntroductiontoAutomataTheory,Languages,andComputation.Addison-Wesley,1979. F.Irigoin,P.Jouvelot,andR.Triolet.OverviewofthePIPSproject.In andp.ouellet.compilingcfortheearthmultithreadedarchitecture.int. [IJT90] [IT88] F.IrigoinandR.Triolet.Supernodepartitioning.In15thACMSymp.on P.FeautrierandF.Irigoin,editors,2ndInt.WorkshoponCompilersforParallelComputers,pages199{212,Paris,December1990. PrinciplesofProgrammingLanguages(PoPL'88),pages319{328,SanDiego, [JM82] California,USA,January1988. [Kar92] owanalysisandprogramswithrecursivedatastructures.acmpress,1982. G.Karner.Nivat'stheoremforpushdowntransducers.TheoreticalComputer N.D.JonesandS.S.Muchnick.Aexibleapproachtointerproceduraldata [KPRS96]W.Kelly,W.Pugh,E.Rosser,andT.Shpeisman.Transitiveclosureofinnite Science,97:245{262,1992. [KRS94] graphsanditsapplications.int.journalofparallelprogramming,24(6):579{ tice.acmtransactionsonprogramminglanguagesandsystems(toplas), 598,1996. 16(4):1117{1155,1994. J.Knoop,O.Ruthing,andB.Steen.Optimalcodemotion:Theoryandprac- [KS92] J.KnoopandB.Steen.Theinterproceduralcoincidencetheorem.InProc. [KS93] ofthe4thint.conferenceoncompilerconstruction(cc'92),number641in LNCS,Paderborn,Germany,1992. N.KlarlundandM.I.Schwartzbach.Graphtypes.In20thACMSymp.on [KS98] K.KnobeandV.Sarkar.ArraySSAformanditsuseinparallelization.In SouthCarolina,USA,January1993. PrinciplesofProgrammingLanguages(PoPL'93),pages196{205,Charleston, [KSV96] 25thACMSymp.onPrinciplesofProgrammingLanguages,pages107{120, SanDiego,California,USA,January1998. J.Knoop,B.Steen,andJ.Vollmer.Parallelismforfree:Ecientand [KU77] optimalbitvectoranalysesforparallelprograms.acmtransactionsonprogramminglanguagesandsystems(toplas),18(3):268{299,may1996. [Lef98] J.B.KamandJ.D.Ullman.Monotonedataowanalysisframeworks.Acta vuedesaparallelisation.phdthesis,universitedeversailles,france,february1998. Informatica,7:309{317,1977. V.Lefebvre.Restructurationautomatiquedesvariablesd'unprogrammeen http://www.prism.uvsq.fr/~vil/these.ps.gz.
[LF98] 256 V.LefebvreandP.Feautrier.Automaticstoragemanagementforparallel BIBLIOGRAPHY [LH88] J.R.LarusandP.N.Hilnger.Detectingconictsbetweenstructureaccesses.InACMSymp.onProgrammingLanguageDesignandImplementation programs.parallelcomputing,24(3):649{671,1998. [Li92] (PLDI'88),pages21{34,1988. Z.Li.Arrayprivatizationforparallelexecutionofloops.InACMInt.Conf. [LL97] A.W.LimandM.S.Lam.Communication-freeparallelizationviaane onsupercomputing,pages313{322,washington,districtofcolumbia,usa, transformations.in24thacmsymp.onprinciplesofprogramminglanguages,pages201{214,paris,france,jan1997. July1992.ACMPress. [LRZ93] W.A.Landi,B.G.Ryder,andS.Zhang.Interproceduralmodicationside eectanalysiswithpointeraliasing.inacmsymp.onprogramminglanguagedesignandimplementation(pldi'93),pages56{67,albuquerque,new [MAL93] Mexico,USA,June1993. D.E.Maydan,S.P.Amarasinghe,andM.S.Lam.Arraydataowanalysis anditsuseinarrayprivatization.in20thacmsymp.onprinciplesofprogramminglanguages,pages2{15,charleston,southcarolina,usa,january [Mas93] F.Masdupuy.Semanticanalysisofintervalcongruences.InD.Brner, 1993. Academgorodok,Novosibirsk,Russia,June1993.Springer-Verlag. ProgrammingandtheirApplications,volume735ofLNCS,pages142{155, M.Broy,andI.V.Pottosin,editors,Int.Conf.onFormalMethodsin [MF98] K.H.RandallM.Frigo,C.E.Leiserson.TheimplementationoftheCilk-5 [Mic95] O.Michel.Designandimplementationof81=2,adeclarativedata-parallel multithreadedlanguage.inacmsymp.onprogramminglanguagedesign andimplementation(pldi'98),pages212{223,montreal,canada,june1998. SymbolicLanguagesandSystems,October1995. UniversiteParisSud(ParisXI),France,1995.ContainspaperGroup-based FieldswithJ.-L.GiavittoandJean-PaulSansonnet,Proc.oftheParallel language.technicalreport1012,laboratoirederechercheeninformatique, [MP94] [Min67] V.MaslovandW.Pugh.Simplifyingpolynomialconstraintsoverintegersto M.Minsky.Computation,FiniteandInniteMachines.Prentice-Hall,1967. [MT90] S.MartelloandP.Toth.KnapsackProblems:AlgorithmsandComputer ofmaryland,february1994. makedependenceanalysismoreprecise.technicalreportcs-tr-3109.1,u. [Muc97] S.S.Muchnick.AdvancedCompilerDesign&Implementation.MorganKaufmann,1997. Implementation.JohnWileyandSons,1990.
BIBLIOGRAPHY [Par66] R.J.Parikh.Oncontext-freelanguages.JournaloftheACM,13(4):570{581, 257 [PD96] 1966. \AutomaticParallelizationinthePolytopeModel",pages79{103. Number1132inLNCS.Springer-Verlag,1996.Forschedulingissues,see G.R.PerrinandA.Darte,editors.TheDataParallelProgrammingModel. [PS98] destelecommunications(enst),paris,france,may1998.toappearin M.PelletierandJ.Sakarovitch.Ontherepresentationofnitedeterministic TheoreticalComputerScience. 2-tapeautomata.TechnicalReport98C002,EcoleNationaleSuperieure [Pug92] [QR99] W.Pugh.Apracticalalgorithmforexactarraydependenceanalysis.CommunicationsoftheACM,35(8):27{47,August1992. F.QuillereandS.Rajopadhye.Optimizingmemoryusageinthepolyhedralmodel.TechnicalReport1228,InstitutdeRechercheenInformatiqueet [RF94] Supercomputing,pages117{125,Manchester,UK,July1994. X.RedonandP.Feautrier.Schedulingreductions.InACMInt.Conf.on SystemesAleatoires,UniversitedeRennes,France,January1999. [Rin97] programsusingoptimisticsynchronizationprimitives.in6thacmsymp.on M.Rinard.Eectivene-grainsynchronizationforautomaticallyparallelized [RR99] PrinciplesandPracticeofParallelProgramming(PPoPP'97),pages112{123, R.RuginaandM.Rinard.Automaticparallelizationofdivideandconquer LasVegas,Nevada,USA,June1997. [RS97a] algorithms.in7thacmsymp.onprinciplesandpracticeofparallelprogramming(ppopp'99),atlanta,georgia,usa,may1999. [RS97b] G.RozenbergandA.Salomaa,editors.HandbookofFormalLanguages,volume1:WordLanguageGrammar.Springer-Verlag,1997. [SCFS98]M.M.Strout,L.Carter,J.Ferrante,andB.Simon.Schedule-independant G.RozenbergandA.Salomaa,editors.HandbookofFormalLanguages,volume3:BeyondWords.Springer-Verlag,1997. storagemappingforloops.inacmsymp.onarchitecturesupportforprogramminglanguagesandoperatingsystems,8,1998. [Sch86] A.Schrijver.TheoryofLinearandIntegerProgramming.JohnWileyand [SKR90] Sons,Chichester,UK,1986. B.Steen,J.Knoop,andO.Ruthing.Thevalueowgraph:Aprogramrepresentationforoptimalprogramtransformations.InProc.ofthe3rdEuropean Copenhagen,Denmark,May1990. Symp.onProgramming(ESOP'90),volume432ofLNCS,pages389{405, [SRH96] 2):131{170,October1996. M.Sagiv,T.Reps,andS.Horwitz.Preciseinterproceduraldataowanalysis withapplicationstoconstantpropagation.ieeetrans.oncomputers,167(1{
[SRW96] 258 S.Sagiv,T.W.Reps,andR.Wilhelm.Solvingshape-analysisproblems BIBLIOGRAPHY inlanguageswithdestructiveupdating.in23rdacmsymp.onprinciples [SSP99] Florida,USA,January1996. H.Saito,N.Stavrakos,andC.Polychronopoulos.Multithreadingruntime ofprogramminglanguages(popl'96),pages16{31,st.petersburgbeach, supportforloopandfunctionalparallelism.inint.symp.onhighperformancecomputing(ishpc'99),number1615inlncs,pages133{144,kyoto, Japan,May1999.Springer-Verlag. [Ste96] B.Steensgaard.Points-toanalysisinalmostlineartime.In23rdACMSymp.on [TD95] O.TemamandN.Drach.Softwareassistancefordatacaches.FutureGenerationComputerSystems,1995.Specialissueonhighperformancecomputer Beach,Florida,USA,January1996. PrinciplesofProgrammingLanguages(PoPL'96),pages32{41,St.Petersburg [TFJ86] R.Triolet,P.Feautrier,andP.Jouvelot.Automaticparallelizationoffortran programsinthepresenceofprocedurecalls.inproc.ofthe1steuropeansymp. architectures. [TP93] P.TuandD.Padua.Automaticarrayprivatization.In6thWorkshopon onprogramming(esop'86),number213inlncs,pages210{222.springer- LanguagesandCompilersforParallelComputing,number768inLNCS,pages Verlag,March1986. [TP95] 500{521,Portland,Oregon,USA,August1993. P.TuandD.Padua.GatedSSA-Baseddemand-drivensymbolicanalysisfor [Tzo97] S.Tzolovski.Datadependencesasabstractinterpretations.InInternational Barcelona,Spain,July1995. parallelizingcompilers.inacmint.conf.onsupercomputing,pages414{423, [Wol92] M.Wolfe.Beyondinductionvariables.InACMSymp.onProgrammingLanguageDesignandImplementation(PLDI'92),pages162{174,SanFrancisco, California,USA,June1992. StaticAnalysisSymposiumSAS'97,Paris,France,1997. [Won95] [WP95] D.WonnacottandW.Pugh.Nonlineararraydependenceanalysis.InProc. UniversityofMaryland,1995. D.G.Wonnacott.Constraint-BasedArrayDependenceAnalysis.PhDthesis, [WR93] Computers,1995.Troy,NewYork,USA. D.K.WildeandS.Rajopadhye.Allocatingmemoryarraysforpolyhedra. ThirdWorkshoponLanguages,CompilersandRun-TimeSystemsforScalable TechnicalReport749,InstitutdeRechercheenInformatiqueetSystemes Aleatoires,UniversitedeRennes,France,July1993.
Index Symbols <lex,70,seelexicographicorder,75, 140,197 <par,81,seeparallelexecutionorder <seq,70,seesequentialexecutionorder <txt,70,seetextualorder,144 ctrl,66,seestatementlabel Lctrl,68,seecontrolword,70,129,139 Ldata,71,seedatastructure abstraction,140 Mdata,71,seedatastructure abstraction,129,140 [i;],128,seeinductionvariable,130, 135 [i](w),128,seeinductionvariable Dexp,156,seememoryexpansion ES,196,seeexpansionvector ES[p+1],197,seeexpansiondegree A,80,seeaccess,82,134 Ae,63,seeaccess,80 E,62,seeprogramexecution,70,129, 156,191,222 I,80,seeinstance,82 Ie,62,seeinstance,68,80 R,80,seeread,140 Re,63,seereadandaccess,80 W,80,seewrite,140 We,63,seewriteandaccess,80,92,seestackalphabetand push-downautomaton 0,92,seeinitialstackwordand push-downautomaton hs;xi,75,seeiterationvectorand instance hs;x;refi,75,seeiterationvectorand access,209,seeconstraintrelation,214 R,173,seestaticexpansion,175 R,173,seestaticexpansion W,217,seeweakenedstaticexpansion W,217,seeweakenedstaticexpansion,77,seedependencerelation,140 e,77,seedependencerelation,140 exp,81,seedependencerelationand memoryexpansion,82,210,214 exp e,81,seedependencerelationand memoryexpansion,175,seeconictrelationandstatic expansion,76,seeconictrelation,175,191 e,76,seeconictrelation,191 6,191,seeno-conictrelation,193 6e,191,seeno-conictrelation./,193,seeinterferencerelation,194, 210,211,211,seeinterferencerelation,214,212,seecoloringrelation,213,seeconstraintcoloringrelation,78,seereachingdenition ml,164,seereachingdenitionofa memorylocationandmemory expansion ml e,164,seereachingdenitionofa memorylocationandmemory expansion e,77,seereachingdenition,156,seememoryexpansion,168,174, 217,219 JoinsA,220,seejoin Points,219,seeprogrampoint Ancestors(u),142,seeancestor Array,160,seememoryexpansion CurIns,156,seerun-timeinstanceand memoryexpansion,227,240 Iter,160,seememoryexpansionand iterationvector Stmt,160,seememoryexpansionand iterationvector Undefined,130,seeinductionvariable,82,seeschedule,85 ",91,seeemptyword fexp e,81,seestoragemappingand 259
260 INDEX memoryexpansion,173,191,209 fe,75,seestoragemapping,173 f,129,seestoragemapping AS[x],160,seememoryexpansion Dexp,157,seememoryexpansion A-selection,108 -selection,141,231 access,63,75 A,80,82,134 Ae,63,80 R,140 Re,63,80 W,140 We,63,80 hs;x;refi,75 algebraicfunction,116 algebraicgrammar,92 algebraiclanguage,92 algebraicrelation,115 algebraictransducer,114,see push-downtransducer aliased,65 analysisofconictingaccesses,76 ancestor,142,144,148 Ancestors(u),142 Bblock,63 Ccalltree,70 causalityconstraint,82 coloringrelation,212,212 complete,105 conguration,93,114 conict,76 conictequation,175 conictrelation,76,139,140,191,211,175,76,191 e,76,191 constrainedexpansion,209 constraintcoloringrelation,213,213 constraintrelation,209,209,214,214 context-freegrammar,92 context-freelanguage,92 controlautomaton,67 compressed,69 controlparallelism,58 controltree,70,123,142 compressed,70 controlword,68 Lctrl,68,70,129,139 Ddataparallelism,59 datastructureabstraction,139 Ldata,71,140 Mdata,71,129,140 data-owexecutionorder,200 -synchronizable,102 -synchronous,102 dependence,77 dependenceanalysis,77 dependencerelation,77,77,140 e,77,140 exp,81,82,210,214 exp e,81 deterministicalgebraiclanguages,93 dominancefrontier,219 dynamicarrays,160 Eedgename,64,71,236 emptyword,91 ",91 executionfront,60 executiontrace,66 expansioncorrectnesscriterion,192, 193,194 expansiondegree,197 ES[p+1],197 expansionvector,196 ES,196 Fner,81,seestoragemapping,174 nite-stateautomaton deterministic,91 nitelygenerated,91,97,98
INDEX 261 formallanguage,91 freemonoid,91 freepartiallycommutativemonoid,72, 118 Iinductionvariable,127 [i;],128,130,135 [i](w),128 Undefined,130 undenedvalue,130 valueataninstance,128 initialstackword,92 0,92 inputautomaton,100 instance,62,75 I,80,82 Ie,62,68,80 hs;xi,75 integerlinearprogramming,87 interferencerelation,193,210,211./,193,194,210,211,211,214 iterationvector hs;x;refi,75 hs;xi,75 Iter,160 Stmt,160 iterationvectors,74 Jjoin,219 JoinsA,220 Lleft-synchronizable,102 left-synchronous,102,148,231 lexicographicorder,70,75,88,103,140 <lex,70,75,140,197 loopvariable,64 Mmaximal,211 maximalconstrainedexpansion,222 maximalstaticexpansion,174 memoryexpansion,81 Dexp,156 exp,81,82,210,214 exp e,81 ml,164 ml e,164,156,168,174,217,219 Array,160 CurIns,156,227,240 Iter,160 Stmt,160 fexp e,81,173,191,209 AS[x],160 @-structures,157,166 -structures,157 Dexp,157 monoid,90 Nno-conictrelation,191 6,191,193 6e,191 Oone-counterautomaton,94,95 one-counterlanguage,95 one-counterrelation,116 one-countertransducer,116 onlinealgebraictransducer,116 onlinealgebraictransduction,116,231 onlinerationaltransducer,101 onlinerationaltransduction,101,231 outputautomaton,100 Pparallelexecutionorder,81 <par,81 parallelization,81 partialexpansion,196,197 partialrenaming,196 path,91,99 label,91,99 privatization,233 programexecution,62 E,62,70,129,156,191,222 programpoint,219 Points,219 pseudo-left-synchronizable,119 pseudo-left-synchronous,119,148 push-downautomaton,92,92 0,92 deterministic,93 push-downtransducer,114
262 INDEX push-downautomaton interpretation,115 underlyingrationaltransducer,118, 120,122 Qquasi-aneselectiontree,88,seequast quast,88,160,165 quasi-aneselectiontree,88 Rrationalfunction,99 rationallanguage,92 rationalrelation,97,128 rationalset,97,128 rationaltransducer,98 nite-stateautomaton interpretation,99,107 reachingdenition,77,173,78 e,77 reachingdenitionanalysis,78 reachingdenitionofamemory location,164,217 ml,164 ml e,164 read,63 R,80,140 Re,63,80 realize,91,99,135,140 byemptystack,93,115 bynalstate,93,94,114,116 recognizablerelation,97,148 recognizableset,97 regularlanguage,91,seerational language right-synchronizable,103 right-synchronous,103 run-timeinstance,61 CurIns,156,227,240 SA,156 schedule,59,82,85,82,85 schedule-independent,188,200 semi-group,90 sequentialexecutionorder,70 <seq,70 sequentialfunction,100,231 sequentialtransducer,100 shapeanalysis,65 single-assignment,156 SSA,156 stackalphabet,92,92 statement,63 statementlabel,66 ctrl,66 staticexpansion,173 R,173,175 R,173,175 staticsingle-assignment,156 storagemapping,75,126,128,135 fexp e,81,173,191,209 fe,75,173 ner,81 sub-sequentialfunction,101,231 sub-sequentialtransducer,100 synchronizable,102 synchronizationgraph,236 synchronous,102 Ttextualorder,70,144 <txt,70,144 tiling,84 tile,84 topstacksymbol,93 transduction,98 algebraic,115 rational,98 recognizable,98 transmissionrate,110 trim,91,99 Uunambiguous,105 underlyingrationaltransducer,148 use,77,173 Wweakenedstaticexpansion,217 W,217 W,217 write,63 W,80,140 We,63,80
INDEX 263
tionsdeviennenttropspeciquesetcomplexespour^etrelaisseesausoinduprogram- meur.lestechniquesdeparallelisationautomatiquedepassentlecadretraditionnel quelesnidsdebouclesnonanes,lesappelsrecursifsetlesstructuresdedonnees dynamiques.desanalysesprecisessontaucurdeladetectionduparallelisme,elles l'execution.cesinformationsvalidentdestransformationsutilespourl'extraction duparallelismeetlagenerationdecodeparallele. rassemblentdesinformationsalacompilationsurlesproprietesdesprogrammesa veauxdesauxtechniquesdecompilation.enpresencedeparallelisme,lesoptimisa- Lesmicroprocesseursetlesarchitecturesparallelesd'aujourd'huilancentdenou- Resume desapplicationsnumeriquesetabordentdenouveauxmodelesdeprogrammes,tels visionparinstances,c'est-a-direconsiderantlesproprietesindividuellesdechaque instanced'uneinstructional'execution.unenouvelleformalisational'aidedelangagesformelsnouspermettoutd'abordd'etudieruneanalysededependancesetde Cettetheseabordeprincipalementdesanalysesetdestransformationsavecune partiedecetravail.unenouvelleetudedestechniquesdeparallelisationfondeessur analyseal'expansionetlaparallelisationdeprogrammesrecursifsdevoiledesresultatsencourageants.lesnidsdebouclesquelconquesfontl'objetdeladeuxieme denitionsvisiblesparinstancespourprogrammesrecursifs.l'applicationdecette Mots-cles:parallelisationautomatique,programmesrecursifs,nidsdebouclesnonaf- l'expansionnouspermetdeproposerdessolutionsadesproblemesd'optimisation nes,analysededependances,analysededenitionsvisibles,expansiondelamemoire. cruciaux. ingnewchallenges.dealingwithparallelexecution,optimizationsbecomeoverly specicandcomplextobelefttotheprogrammer.traditionallydevotedtonumericalapplications,automaticparallelizationaddressesnewprogrammodels,including Compilationfortodaysmicroprocessorandmulti-processorarchitecturesisfac- Abstract lelismdetectionisbasedonpreciseanalyses,gatheringcompile-timeinformation aboutrun-timeprogramproperties.thisinformationenablestransformationsusefultoparallelismextractionandparallelcodegeneration. non-anenestsofloops,recursivecallsandpointer-baseddatastructures.paral- rstinvestigateinstancewisedependenceandreachingdenitionanalysisforrecursiveprograms.thisanalysisisappliedtomemoryexpansionandparallelizationof recursiveprograms,andpromisingresultsareexposed.thesecondpartofthiswork aninstancewisepointofview,thatisfromindividualpropertiesofeachrun-time instanceofaprogramstatement.thankstoanovelformallanguageframework,we Thisthesisfocusesonaggressiveanalysisandtransformationtechniquesfrom addressesnestsofloopswithunrestrictedconditionals,boundsandarraysubscripts. Keywords:automaticparallelization,recursiveprograms,non-aneloopnests,dependenceanalysis,reachingdenitionanalysis,memoryexpansion. challengingoptimizationproblemsareproposed. Parallelizationviamemoryexpansionisrevisitedinthiscontextandsolutionsto