Program Analysis and Transformation: From the Polytope Model to Formal Languages
|
|
|
- Eloi Coutu
- il y a 10 ans
- Total affichages :
Transcription
1 Program Analysis and Transformation: From the Polytope Model to Formal Languages Albert Cohen To cite this version: Albert Cohen. Program Analysis and Transformation: From the Polytope Model to Formal Languages. Networking and Internet Architecture [cs.ni]. Université de Versailles-Saint Quentin en Yvelines, English. <tel > HAL Id: tel Submitted on 31 Dec 2010 HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
2 THESEdeDOCTORATdel'UNIVERSITEdeVERSAILLES Specialite:Informatique AlbertCOHEN presenteepar pourobtenirletitrededocteurdel'universitedeversailles Analyseettransformationdeprogrammes: Sujetdelathese: dumodelepolyedriqueauxlangagesformels FromthePolytopeModeltoFormalLanguages ProgramAnalysisandTransformation: Soutenuele21decembre1999devantlejurycomposede: Paul Jean-FrancoisCollard Luc Bouge Feautrier Berstel Examinateur Rapporteur William Patrice Bernard Jalby Vauquelin Quinton Directeur President Rapporteur Theseprepareeal'UniversitedeVersaillesSaint-Quentin-en-Yvelinesauseindu laboratoireprism(parallelisme,reseaux,systemesetmodelisation)
3
4 Remerciements en-yvelines,entreseptembre1996etdecembre1999,sousladirectiondejean- FrancoisCollardetPaulFeautrier. seaux,systemesetmodelisation)del'universitedeversaillessaint-quentin- CettetheseaeteprepareeauseindulaboratoirePRiSM(Parallelisme,Re- recherchesaucnrs)quiaencadrecettethese,etavecquij'aieulachance defairemespremierspasdanslarecherchescientique.sesconseils,sadisponibiliteextraordinaire,sondynamismeentoutescirconstances,etsesidees Jevoudraistoutd'abordm'adresseraJean-FrancoisCollard(chargede eclaireesontfaitbeaucoupplusqu'entretenirmamotivation.jeremercievivementpaulfeautrier(professeurauprism)poursaconanceetpourson succesponctuels. inter^etasuivremesresultats.atraverssonexperience,ilm'afaitdecouvrir aquelpointlarechercheestenthousiasmante,audeladesdicultesetdes mentenversjeanberstel(professeural'universitedemarne-la-vallee),pa- tricequinton(professeural'irisa,universitederennes)etbernardvau- quelin(professeuraulabri,universitedebordeaux),pourl'inter^etetla JesuistresreconnaissantenverstouslesmembresdemonJury;notam- ilsontrelucettethese,ycomprislorsquelaproblematiquen'appartenaitpas curiositequ'ilsontporteal'egarddemestravauxetpourlesoinaveclequel poursessuggestionsetcommentaireseclaires.merciennawilliamjalby aleursdomainesderecherches.ungrandmercialucbouge(professeurau souventconseilleavecbonnehumeur. (professeurauprism)pouravoiracceptedepresidercejuryetpourm'avoir LIP,EcoleNormaleSuperieuredeLyon)poursaparticipationaceJuryet couragementsetpourl'accesa(sa)machineparallele,aoliviercartonpour sonaideprecieusesurundomainetresexigeant,adenisbarthou,ivandjelic etvincentlefebvrepourleurcollaborationessentielleauxresultatsdecette J'exprimeegalementtoutemagratitudeaGuy-RenePerrinpoursesenpenseauxbonsmomentspassesaveclestouslesmembresdu(monastere) nonplusl'aideecacedesingenieursetdessecretairesdulaboratoire.jere- PhilippeClauss,ChristineEisenbeisetSanjayRajopadhye;etjen'oubliepas these.jemesouviensaussidepassionnantesdiscussionsavecpierreboulet, etaveclescompagnonsderouteduprismquisontdevenusmesamis. unepenseeparticulierepourmesparentsetpourmafemmeisabelle. Merciennamafamillepoursonsoutienconstantetinconditionnel,avec
5 DedicatedtoaBraveGNUWorld CopyrightcAlbertCohen1999. Lacopieetladistributiondecopiesexactesdecedocumentsontautorisees,maisaucune thisnoticeispreserved. Verbatimcopyinganddistributionofthisdocumentispermittedinanymedium,provided modicationn'estpermise. Graphicsweredesignedusingxg,gnuplotandtheGasTEXpackage. ThisdocumentwastypesetusingLATEXandthefrenchpackage.
6 TABLEOFCONTENTS 5 TableofContents ListofAlgorithms ListofFigures 7 Presentationenfrancais Grandeslignesdelathese,enfrancais Introduction 1.1ProgramAnalysis...54 Dissertationsummary,inFrench. 1.2ProgramTransformationsforParallelization ThesisOverview Framework 2.1GoingInstancewise ProgramModel ControlStructures AbstractModel DataStructures NamingStatementInstances InstancewiseAnalysis LoopNestsandArrays AdressingMemoryLocations SequentialExecutionOrder MoreAboutApproximations AnExampleofInstancewiseReachingDenitionAnalysis ReachingDenitionAnalysis ConictingAccessesandDependences Parallelization MemoryExpansionandParallelismExtraction FormalTools 2.5.2ComputationofaParallelExecutionOrder PresburgerArithmetics GeneralEciencyRemarks Sets,RelationsandFunctions TransitiveClosure MonoidsandFormalLanguages MonoidsandMorphisms RationalLanguages AlgebraicLanguages One-CounterLanguages...94
7 6 TABLEOFCONTENTS 3.3RationalRelations RecognizableandRationalRelations RationalTransductionsandTransducers RationalFunctionsandSequentialTransducers Left-SynchronousRelations Denitions AlgebraicProperties FunctionalProperties AnUndecidabilityResult StudyingSynchronizabilityofTransducers DecidabilityResults FurtherExtensions BeyondRationalRelations AlgebraicRelations One-CounterRelations MoreaboutIntersection IntersectionwithLexicographicOrder ThecaseofAlgebraicRelations ApproximatingRelationsonWords ApproximationofRationalRelationsbyRecognizableRelations ApproximationofRationalRelationsbyLeft-SynchronousRelations ApproximationofAlgebraicandMulti-CounterRelations InstancewiseAnalysisforRecursivePrograms MotivatingExamples FirstExample:ProcedureQueens SecondExample:ProcedureBST ThirdExample:FunctionCount WhatNext? MappingInstancestoMemoryLocations InductionVariables BuildingRecurrenceEquationsonInductionVariables SolvingRecurrenceEquationsonInductionVariables ComputingStorageMappings ApplicationtoMotivatingExamples DependenceandReachingDenitionAnalysis BuildingtheConictTransducer BuildingtheDependenceTransducer FromDependencestoReachingDenitions PracticalApproximationofReachingDenitions TheCaseofTrees TheCaseofArrays TheCaseofCompositeDataStructures ComparisonwithOtherAnalyses Conclusion ParallelizationviaMemoryExpansion MotivationsandTradeos ConversiontoSingle-AssignmentForm Run-TimeOverhead Single-AssignmentforLoopNests OptimizationoftheRun-TimeOverhead...161
8 TABLEOFCONTENTS TradeobetweenParallelismandOverhead MaximalStaticExpansion Motivation ProblemStatement FormalSolution Algorithm DetailedReviewoftheAlgorithm ApplicationtoRealCodes BacktotheExamples Experiments Implementation StorageMappingOptimization Motivation ProblemStatementandFormalSolution OptimalityoftheExpansionCorrectnessCriterion Algorithm ArrayReshapingandRenaming DealingwithTiledParallelPrograms Schedule-IndependentStorageMappings DynamicRestorationoftheData-Flow BacktotheExamples Experiments ConstrainedStorageMappingOptimization Motivation ProblemStatement FormalSolution Algorithm BuildingExpansionConstraints Graph-ColoringAlgorithm DynamicRestorationoftheData-Flow ParallelizationafterConstrainedExpansion BacktotheMotivatingExample ParallelizationofRecursivePrograms ProblemsSpecictoRecursiveStructures Algorithm GeneratingCodeforReadReferences PrivatizationofRecursivePrograms ExpansionofRecursivePrograms:PracticalExamples StatementwiseParallelization InstancewiseParallelization Conclusion Conclusion Contributions Perspectives Bibliography 249 Index 259
9 8 LISTOFFIGURES ListofFigures 1.2Run-timerestorationoftheowofdata Exposingparallelism Simpleexamplesofmemoryexpansion ControlautomataforprogramQueens Hash-tabledeclaration ProcedureQueensandcontroltree Aboutrun-timeinstancesandaccesses Aninodedeclaration StudyingtheLukasiewiczlanguage Execution-dependentstoragemappings ComputationofParikhvectors Left-synchronousrealizationofseveralorderrelations Synchronousand-synchronoustransducers One-counterautomatonfortheLukasiewiczlanguage Aleftandrightsynchronizableexample Sequentialandsub-sequentialtransducers ProcedureBSTandcompressedcontrolautomaton ProcedureCountandcompressedcontrolautomaton ProcedureQueensandcontroltree RationaltransducerforstoragemappingfofprogramBST Moreexamplesofinductionvariables ProcedureCountandcontrolautomaton Firstexampleofinductionvariables RationaltransducerforconictrelationofprogramBST One-countertransducerforconictrelationofprogramQueens RationaltransducerforstoragemappingfofprogramQueens RationaltransducerfordependencerelationofprogramBST One-countertransducerfortherestrictionofdependencerelationtoowdependences Pseudo-left-synchronoustransducerfortherestrictionoftoWR One-countertransducerforreachingdenitionrelationofprogramQueens Simpliedone-countertransducerfor Basicoptimizationsofthegeneratedcodeforfunctions Interactionofreachingdenitionanalysisandrun-timeoverhead Repeatedassignmentstothesamememorylocation Firstexample,continued Parallelismextractionversusrun-timeoverhead Firstexample ImprovingtheSAalgorithm...165
10 LISTOFFIGURES 5.8Expandedversionoftherstexample Secondexample Insertingcopy-outcode Thirdexample Maximalstaticexpansionforthesecondexample Partitionoftheiterationdomain(N=4) Convolutionexample Experimentalresultsfortherstexample Computationtimes,inmilliseconds Parallelizationoftherstexample Knapsackprogram KPinsingle-assignmentform PartialexpansionforKP Motivatingexamplesforeachconstraintinthedenitionoftheinterferencerelation Instancewisereachingdenitions,schedule,andtilingforKP Casesoffexp 5.24Anexampleofblock-regularstoragemapping Timeandspaceoptimization e(v)6=fexp e(w)in(5.17) Performanceresultsforstoragemappingoptimization Performanceresults Maximalstaticexpansion Parallelizationofthemotivatingexample Motivatingexample Whatwewanttoachieve Maximalstaticexpansioncombinedwithstoragemappingoptimization Single-assignmentformconversionofprogramQueens Howweachieveconstrainedstoragemappingoptimization Solvingtheconstrainedstoragemappingoptimizationproblem Strangeinterplayofconstraintandcoloringrelations PrivatizationofprogramQueens Secondmotivatingexample:programMap ParallelizationofprogramBST Implementationofthereadreferenceinstatementr ParallelizationofprogramQueensviaprivatization AutomaticinstancewiseparallelizationofprocedureP Parallelresolutionofthen-Queensproblem Instancewiseparallelizationexample...241
11 10 LISTOFALGORITHMS ListofAlgorithms Recurrence-Solve(system) Recurrence-Rewrite(program;system) Compute-Storage-Mappings(program) Recurrence-Build(program) Abstract-Implement-Phi(expanded) Abstract-SA(program;W;) Reaching-Denition-Analysis(program) Dependence-Analysis(program) Convert-Quast(quast;ref) Loop-Nests-SA(program;) Abstract-Implement-Phi-Not-SA(expanded) Loop-Nests-ML-SA(program;ml) Abstract-ML-SA(program;W;ml) Loop-Nests-Implement-Phi(expanded) MSE-Convert-Quast(quast;ref) Compute-Representatives(equivalence) Maximal-Static-Expansion(program;;) Enumerate-Representatives(rel;fun) Storage-Mapping-Optimization(program;;6;<par) SMO-Convert-Quast(quast;ref) CSMO-Convert-Quast(quast;ref) Constrained-Storage-Mapping-Optimization(program;;;;<par) Build-Expansion-Vector(S;./) Cyclic-Coloring() Partial-Renaming(program;./) Near-Block-Cyclic-Coloring(;shape) CSMO-Implement-Phi(expanded) Recursive-Programs-Implement-Phi(expanded) Recursive-Programs-Online-SA(program;) CSMO-Eciently-Implement-Phi(expanded) Statementwise-Parallelization(program;) Recursive-Programs-SA(program;) Instancewise-Parallelization(program;)...242
12 11 Presentationenfrancais suivants ecritsenanglais.sonorganisationestlereetdelastructuredelatheseetles sectionsetsous-sectionscorrespondentrespectivementauxchapitresetaleurssections. Lelecteurdesirantapprofondirundessujetspresentespourradoncsereporteralapartie Apresuneintroductiondetaillee,cechapitreoreunresumeenfrancaisdeschapitres correspondanteenanglaispourytrouverledetaildesalgorithmesainsiquedesexemples. Tabledesmatieres I I.1 Introduction...12 II I.3 I.2 Analysedeprogrammes...13 II.1 Modeles...20 Unevisionparinstances...20 Organisationdecettethese...19 Transformationsdeprogrammespourlaparallelisation...16 II.2 II.3 Modeledeprogrammes...21 IIIOutilsmathematiques...27 II.4 II.5 Analyseparinstances...25 Parallelisation...26 Modeleformel...22 III.4Depasserlesrelationsrationnelles...32 III.3Relationssynchronesagauche...31 III.1ArithmetiquedePresburger...27 III.5Complementssurlesapproximations...34 III.2Langagesformelsetrelationsrationnelles...28 IVAnalyseparinstancepourprogrammesrecursifs...34 IV.4Lesresultatsdel'analyse...39 IV.2Relierinstancesetcellulesmemoire...35 IV.3Analysededependancesetdedenitionsvisibles...38 IV.1Exemplesintroductifs...34 V IV.5Comparaisonavecd'autresanalyses...41 Expansionetparallelisation...42 V.1 V.2 Motivationsetcompromis...42 V.5 V.4 V.3 Parallelisationdeprogrammesrecursifs...46 Expansionoptimiseesouscontrainte...45 Optimisationdel'occupationenmemoire...45 Expansionstatiquemaximale...44 VIConclusion...49 VI.1Contributions...49 VI.2Perspectives...51
13 12 I Introduction PRESENTATIONENFRANCAIS facteurs:uneforteaugmentationdelafrequence,desbuspluslarges,l'utilisationdeplusieursunitesfonctionnelleseteventuellementdeplusieursprocesseurs,lerecoursades hierarchiesmemoirecomplexespourcompenserlestempsd'acces,etundeveloppement Lesprogresaccomplisenmatieredetechnologiedesprocesseursresultentdeplusieurs desperformancesoptimalespourunprogrammedonnedevientdeplusenpluscomplexe. l'executionsuperscalaireetdesarchitecturesparallelesamemoirepartagee,larecherche demoinsenmoinssimpleetuniforme:endepitdelagestionmaterielledescaches,de globaldescapacitesdestockage.uneconsequenceestquelemodeledemachinedevient Debonnesoptimisationspouruncasparticulierpeuventconduireadesresultatsdesastreuxavecunearchitecturedierente.Deplus,lagestionmateriellen'estpascapablede tirerpartieecacementdesarchitectureslespluscomplexes:enpresencedehierarchies memoireprofondes,dememoireslocales,decalculoutofcore,deparallelismed'instructionsoudeparallelismeagrosgrain,uneaideducompilateurestnecessairepourobtenir debonnesperformances. descriteresd'ecacitepratiquesetpourdevelopperdesoptimisationsspeciquespourune part,etpourlaplupartdesapplications,lesarchitecturessonttropdisparatespourdenir lacommunauteducalculahautesperformancesadecouvertdepuisdesannees.d'une L'industriedesarchitecturesetdescompilateurstoutentierearonteenrealiteceque machinedonnee.d'autre-part,lesprogrammessontecritsdetellesortequelestechniques ordinateursparalleles,unprogramme oubienl'algorithmequ'ilimplemente doit traditionnellesd'optimisationetdeparallelisationonttoutlemaldumondeanourrirla b^etedecalcull'ons'appr^eteainstallerdansunbanalordinateurportable. possederundegresusantdeparallelisme.danscesconditions,lesprogrammeursoules compilateursdoiventmettreenevidenceceparallelismeetappliquerlestransformations Pouratteindredesperformanceseleveesal'aidedesmicroprocesseursmodernesetdes necessairespouradapterleprogrammeauxcaracteristiquesdelamachine.uneautre exigenceestqueleprogrammesoitportablesurdesarchitecturesdierentes,ande oertesauxprogrammeurs. suivrel'evolutionrapidedesmachinesparalleles.lesdeuxpossibilitessuivantessontainsi {Premierement,leslangagesaparallelismeexplicite.Laplupartsontdesextensions nees,commehpf,oucombinerparallelismededonneesetdet^aches,commeles extensionsopenmppourarchitecturesamemoirepartagee.quelquesextensions parallelesdelangagessequentiels.ceslangagespeuvent^etreaparallelismededon- oucilkdumit[mf98].toutescesapprochesfacilitentlaprogrammationd'algorithmesparalleles.enrevanche,leprogrammeurestchargedecertainesoperations sontproposeessouslaformedebibliotheques:pvmetmpiparexemple,oubien desenvironnementsdehautniveaucommeimldel'universitedel'illinois[ssp99] techniquescommeladistributiondesdonneessurlesprocesseurs,lescommunica- {Deuxiemement,laparallelisationautomatiqued'unlangagesequentieldehautniveau.Lesavantagesevidentsdecetteapprochesontlaportabiliteetlasimplicite andecomprendre aumoinspartiellement quelscalculssonteectuesetou diedel'architectureetreduisentnotablementlaportabilite. tionsetlessynchronisations.cesoperationsrequierentuneconnaissanceapprofon- delaprogrammation.malheureusement,lat^achequiincombeaucompilateurparalleliseurdevientecrasante.eneet,leprogrammedoittoutd'abord^etreanalyse
14 I.INTRODUCTION resideleparallelisme.lecompilateurdoitalorsgenereruncodeparallele,enprenantencomptelesspecicitesdel'architecture.lelangagesourceusuelpourla 13 parallelisationautomatiqueestlefortran77.eneet,denombreusesapplications avanceesquel'approchehistoriquemaisplusprochesdecetravail:ellesconsiderent lelisationducoudelangagesfonctionnelscommelisp.cesrecherchessontmoins decontr^olerelativementsimples.plusieursetudesconsiderentneanmoinslaparal- scientiquesonteteecritesenfortran,n'autorisantquedesstructuresdedonneeset lesstructuresdedonneesetdecontr^olelesplusgenerales.denombreuxprojets derechercheexistent:parafrase-2etpolaris[bef+96]del'universitedel'illinois, versitedepassau[gl97],etpafdel'universitedeversailles;ilyaegalementun lecompilateurmccat/earth-cdel'universitemcgill[htz+97],loopodel'uni- nombrecroissantd'outilsdeparallelisationcommerciaux,commecft,forge, PIPSdel'EcoledesMinesdeParis[IJT90],SUIFdel'UniversitedeStanford[H+96], etsemi-automatique:cettetheseabordealafoisl'analyseetlatransformationdeprogrammes. Nousnousinteressonsprincipalementauxtechniquesdeparallelisationautomatique FORESYSouKAP. source,enameliorantuncertainnombredeparametresdel'execution.pourappliquerune I.1Optimiserouparalleliseurunprogrammerevientgeneralementatransformersoncode Analysedeprogrammes transformationdeprogrammealacompilation,ondoits'assurerquel'algorithmeimplementen'estpastoucheaucoursdel'operation.etantdonnequ'unalgorithmepeut^etre implementedebiendesmanieresdierentes,lavalidationd'unetransformationdeprogrammesrequiertunprocessusd'ingenierieal'envers(reverseengineering)pouretablir evidencestatique c.-a-d.alacompilation d'informationssurlesproprietesdynamiques c.-a-d.al'execution. Analysestatique proprietesdel'etatdelamachineentrel'executiondedeuxinstructions.cesetatssont appelespointsdeprogrammes.detellesproprietessontditesstatiquescarellesrecouvrent touteslesexecutionspossiblesconduisantaunpointdeprogrammedonne.bienentendu, Enmatiered'analysedeprogrammes,lespremieresetudessesontporteessurles l'informationlaplusprecisepossiblesurcequefaitleprogramme.cettetechniquefondamentaled'analysedeprogrammestentederesoudreleproblemediciledelamiseen vientpasdela:ilseraitprobablementplusappropriedeparlerd'analyse(syntaxique). grandnombred'analysesstatiques.parmilesnombreusespresentationsdeceformalisme cesproprietessontcalculeeslorsdelacompilation,maislesensdel'adjectif(statique)ne [KU77,Muc97,ASU86,JM82,KS92,SRH96],onpeutidentierlespointscommunssuivants.Pourdecrirelesexecutionspossibles,lamethodeusuelleconsisteaconstruirele L'analysedeotdedonneesestlepremiercadregeneralproposepourformaliserle graphedeotdecontr^oleduprogramme[asu86];eneet,cegrapherepresentetousles pointscommedessommets,etlesar^etesentrecessommetssontetiqueteespardesinstructionsduprogramme.l'ensembledetouteslesexecutionspossiblesestalorsl'ensembledtesenunpointdonnesontdeniesdelafaconsuivante:puisquechaqueinstructionpeut touslescheminsdepuisl'etatinitialjusqu'aupointdeprogrammeconsidere.lesproprie-
15 14 modierunepropriete,ondoitprendreencomptetouslescheminsconduisantaupoint PRESENTATIONENFRANCAIS deprogrammeetrassembler(meet)touteslesinformationssurceschemins.laformalisationdecesideesestsouventappeleerassemblementsurtouslescheminsoumeetoverall etdel'abstractionmathematiquepourcelle-ci. paths(mop).biens^ur,l'operationderassemblementdependdelaproprieterecherchee Onprocedealorsauneresolutioniterativedesequationsdepropagation,jusqu'acequ'un intermediaires enavantouenarriere lelongdesar^etesdugraphedeotdecontr^ole. proprietesapartirdelaspecicationmop.lecalculestrealiseenpropageantlesresultats Enrevanche,lenombrepotentiellementinnidecheminsinterdittouteevaluationde eectivementleresultatdeniparmop c.-a-d.mfpconcideavecmop lorsque quelquesproprietessimplesdel'abstractionmathematiquesontsatisfaites;etceresultat pointxesoitatteint.c'estlamethodeditedupointxemaximaloumaximalx-point aeteetendual'analyseinter-proceduraleparknoopetsteen[ks92]. (MFP).Danslecasintra-procedural,KametUllman[KU77]ontprouvequeMFPcalcule (meet) auxpointsderencontre etdesjointures(join) associeesauxinstructions.danscecadre,cousotetcousot[cc77]ontproposeunschemad'approximation treillisenglobelaplupartdesabstractionscarelleautoriselecalculdesrassemblements breuses,enfonctiondel'applicationetdelacomplexitedel'analyse.lastructurede Lesabstractionsmathematiquespourlesproprietesdeprogrammessonttresnom- abstractionsdesproprietesal'aidedetreillis,etd'unautrecote,ilgarantitquetout adeuxinter^etsprincipaux:toutd'abord,ilpermetdeconstruiresystematiquementdes lesproprietesabstraitesalacompilation.ceformalismeappeleinterpretationabstraite fondesurdesconnectionsdegaloissemi-dualesentrelesetatsconcretsdel'executionet pointxecalculedansletreillisabstraitcorrespondauneapproximationconservatrice desmethodesiterativesassocieessontpresenteesdans[cou81,ch78,deu92,cre96]. otdedonnees,l'interpretationabstraitefacilitelespreuvesdecorrectionetd'optimalite d'unpointxedansletreillisdesetatsconcrets.toutenetendantleconceptd'analysede desanalysesdeprogrammes.desapplicationspratiquesdel'interpretationabstraiteet bonnesraisonsexpliquentegalementcefait: automatique.certainesraisonsimportantesnesontpasdenaturescientique,maisde l'interpretationabstraite ontrarementetealabasedestechniquesdeparallelisation Malgred'indeniablessucces,lesanalysesdeotdedonnees fondeesounonsur {lestechniquesmop/mfpsontprincipalementorienteesverslesoptimisationsclassiquesavecdesabstractionsrelativementsimples(lestreillisontsouventunehauteurbornee);leurcorrectionetleurecacitedansunveritablecompilateursontles {dansl'industrie,lesmethodesdeparallelisationsesonttraditionnellementconcentreessurlesnidsdebouclesetsurlestableaux,avecdesdegresimportantsde parallelismededonneesetdesstructuresdecontr^olesimples(nonrecursives,du enjeuxdeterminants,alorsquelaprecisionetl'expressivitedel'abstractionmathematiquesontalabasedelaparallelisationautomatique; premierordre);prouverlacorrectiond'uneanalyseestfaciledanscesconditions, {l'interpretationabstraiteconvientauxlangagesfonctionnelsavecunesemantique lateurdeviennentdesenjeuxmajeurs; alorsquel'applicationadesprogrammesreelsetl'implementationdansuncompi- operationnellepropreetsimple;lesproblemessoulevessontalorsorthogonauxaux questionspratiqueslieesauxlangagesimperatifsetbasniveau,traditionnellement plusadaptesauxarchitecturesparalleles(onverraquecettesituationevolue).
16 I.INTRODUCTION Enconsequence,lesanalysesdeotdedonneesexistantessontgeneralementdesana- 15 tructiondonnee.detelsresultatssontutilesauxtechniquesclassiquesdevericationet d'optimisation[muc97,asu86,skr90,krs94],maispourlaparallelisationautomatique lysesstatiquesquicalculentdesproprietesd'unpointdeprogrammedonneoud'uneins- onabesoind'informationssupplementaires. {Quediredesdierentesinstancesd'unpointdeprogrammeoud'uneinstructiona l'execution?puisquelesinstructionssontgeneralementexecuteesplusieursfois,on {Quediredesdierentselementsd'unestructurededonnees?Puisquelestableaux s'interesseal'iterationdeboucleoual'appeldeprocedurequiconduital'execution detelleinstruction. s'interesseal'elementdetableauouaunuddel'arbrequiestaccedeparune etlesstructuresdedonneesalloueesdynamiquementnesontpasatomiques,on Analyseparinstances instancedonneed'uneinstruction. maineassezrestreint,compareavecl'immensitedesproprietesetdestechniquesetudiees danslecadredel'analysestatique.lemodeledeprogrammeconsidereestegalementplus restreint laplupartdutemps puisquelesapplicationstraditionnellesdesparalleliseurssontlescodesnumeriquesavecdesnidsdebouclesetdestableaux. deselements.alorsquelaseulestructuredecontr^oleetaitlabouclefor/do,lesmethodes [Fea88a] lesanalysessontcapablesd'identierdesproprietesauniveaudesinstanceset iterativesavecdesolidesfondationssemantiquesparaissaientinutilementcomplexes.pour Desledebut aveclestravauxdebanerjee[ban88],brandes[bra88]etfeautrier Lesanalysesdeprogrammespourlaparallelisationautomatiqueconstituentundo- [Ban88]etdesanalysesdedependancesquirassemblentdesinformationssurlesinstances cialisesfutacoups^urpreferable.lespremieresanalysesetaientdestestsdedependance debouclesetdeseetsleselementsdetableaux,laconceptiondemodelessimplesetspe- seconcentrersurlaresolutiondesproblemescruciauxquesontl'abstractiondesiterations dansuneexpression,l'instancedel'instructionquiaproduitlavaleur.ellessontsouvent methodesplusprecisesonteteconcuespourcalculer,pourchaqueelementdetableaulu appeleesanalysesdeotdedonneespourtableaux[fea91,mal93],maisnouspreferons d'instructionsaccedantalam^emecellulememoire,l'undesaccesetantuneecriture.des laqualitedestechniquesdetransformation,etdonclesperformancesdesprogrammes letermed'analysededenitionsvisiblesparinstancespourfavoriserlacomparaisonavec nitionsvisibles[asu86,muc97].uneinformationaussipreciseamelioresignicativement unetechniqueparticuliered'analysestatiquedeotdedonneesappeleeanalysedededeledeprogrammes:ceux-cidevaientinitialementnecomporterquedesbouclessans paralleles. instructionsconditionnelles,avecdesbornesetdesindicesdetableauxanes,etsans appelsdeprocedures.cemodelelimiteenglobedejabonnombredecodesnumeriques, Lesanalysesparinstancesontlongtempssouertdeseveresrestrictionssurleurmo- approcheesurlesdependancesestdisponiblealacompilation:celainduitdesapproximationstropgrossieressurlesdenitionsvisibles.uncalculdirectdecesdenitionsvisibles desdicultesvientdel'impossibilited'etablirdesresultatsexacts,seuleuneinformation etilaegalementlegrandinter^etdepermettrelecalculexactdesdependancesetdesde- nitionsvisibles[fea88a,fea91].lorsquel'onchercheasupprimerdesrestrictions,l'une
17 16 estdoncnecessaire.detellestechniquesonteterecemmentmisesaupointparbarthou, PRESENTATIONENFRANCAIS CollardetFeautrier[CBF95,BCF97,Bar98]etparPughetWonnacott[WP95,Won95], avecdesresultatsextr^emementprecisdanslecasintra-procedural.parlasuite,etdansle casdestableauxetnidsdebouclessansrestrictions,notreanalysededenitionsvisibles parinstancesseral'analyseouedeotdesdonneesoufuzzyarraydataowanalysis (FADA)deBarthou,CollardetFeautrier[Bar98]. comptelesappelsdeprocedure[tfj86,hbcm94,ci96],maiscenesontpaspleinement instructionassocieesadesappelsdierentsdelaprocedureenglobante.eneet,cette desanalysesparinstancescarellesnedistinguentpaslesexecutionsmultiplesd'une Ilexistedenombreusesextensionsdecesanalysesquisontcapablesdeprendreen thesepresentelapremiereanalysequisoitpleinementparinstancespourdesprogrammes comportantdesappelsdeprocedures eventuellementrecursifs. dansunlangageimperatifainsiqueleurcompilationecacesurlesprocesseursmodernes I.2Ilestbienconnuquelesdependanceslimitentlaparallelisationdesprogrammesecrits Transformationsdeprogrammespourlaparallelisation consisteareduirelareutilisationdelamemoireenaectantdescellulesmemoiresdistinctesadesecrituresindependantes,c'est-a-direaexpanserlesstructuresdedonnees. etlessuper-calculateurs.unemethodegeneralepourreduirelenombrededependances donneesdum^emetype;leredimensionnementdetableaux,enparticulierl'ajoutdenouvellesdimensions;laconversiondetableauxenarbres;lamodicationdudegred'un arbre;latransformationd'unevariableglobaleenunevariablelocale. pourimplementerlareferenceexpansee[fea91].lagure1presentetroisprogrammes pourlesquelsaucuneexecutionparallelen'estpossible,enraisondesdependancesdesortie (certainsdetailsducodesontomis).lesversionsexpanseessontpresenteesenpartie Lesreferencesenlecturesontexpanseesegalement,enutilisantlesdenitionsvisibles comportent:lerenommagedevariables;ledecoupageoul'unicationdestructuresde direpourtransformerlesaccesmemoiredanslesprogrammes.lesmethodesclassiques Ilyadenombreusestechniquespourcalculerdesexpansionsdelamemoire,c'est-a- duparallelisme. droitedelagure,pourillustrerl'inter^etdel'expansiondelamemoirepourl'extraction enprovenancedediverscheminsdecontr^oleentrants.cesfonctionssontsemblables neesd'origine:desfonctionspeuvent^etrenecessairespour(rassembler)lesdenitions untravailsupplementaireestnecessairelorsdel'executionpourpreserverleotdedon- Malheureusement,lorsqueleotdecontr^olenepeutpas^etrepreditalacompilation, etenduespourlapremierefoisauxmethodesd'expansionparinstances[gc95,col98]. maisnonidentiques acellesduformalismed'assignationuniquestatiqueoustaticsingle-assignment(ssa)decytronetal.[cfr+91],etcollardetgriebllesontetrenceenlectureassociee(cetteinterpretationesttresdierentedelasemantiqueusuelle L'argumentd'unefonctionestl'ensembledesdenitionsvisiblespossiblespourlarefe- fonctionssontdonneesenpartiedroitedelagure. desfonctionsduformalismessa).lagure2proposedeuxprogrammesavecdesexpressionsconditionnellesetdesindexdetableauinconnus.desversionsexpanseesavec possibles,enfonctiondulangageetdel'architecture. cequiconcernel'implementationdeprogrammesparalleles,deuxvisionsdierentessont unetechniquetresgeneralepourexposerplusdeparallelismedanslesprogrammes.en L'expansionn'estpasuneetapeobligatoiredelaparallelisation;ellerestecependant
18 I.INTRODUCTION intx; x=;=x; x=;=x; intx1,x2; x1=;=x1; x2=;=x2; Apresexpansion,c.-a-d.apresrenommagedexenx1etx2,lesdeuxpremieresinstructions peuvent^etreexecuteesenparalleleaveclesdeuxautres. inta[10]; for(i=0;i<10;i++){ s1 A[0]=; for(j=1;j<10;j++){ s2 A[j]=A[j-1]+; } inta1[10],a2[10][10]; for(i=0;i<10;i++){ s1 A1[i]=; for(j=1;j<10;j++){ s2 A2[i][j]={if(j=1)A1[i]; elsea2[i][j-1];} +; } Apresexpansion,c.-a-d.apresrenommagedutableauAenA1etA2puisajoutd'une dimensionautableaua2,laboucleforestparallele.ladenitionvisibleparinstances delareferencea[j-1]dependdesvaleursdeietj,commelemontrel'implementation avecuneinstructionconditionnelle. inta[10]; voidproc(inti){ A[i]=; =A[i]; if()proc(i+1); if()proc(i-1); } structtree{ intvalue; Tree*left,*right; }*p; voidproc(tree*p,inti){ p->value=; =p->value; if()proc(p->left,i+1); if()proc(p->right,i-1); } Apresexpansion,lesdeuxappelsdeprocedurepeuvent^etreexecutesenparallele.L'allocationdynamiquedelastructureTreeestomise....Figure1.Quelquesexemplesd'expansion... Lapremiereexploiteleparallelismedecontr^ole,c'est-a-direleparallelismeentredes instructionsdierentesdum^emeblocdeprogramme.lebutconsistearemplacerleplus d'executionssequentiellesd'instructionspardesexecutionsparalleles.enfonctiondulangage,ilyaplusieurssyntaxesdierentespourcodercetypedeparallelisme,etcelles-ci peuventnepastoutesavoirlem^emepouvoird'expression.nouspreferonslasyntaxe spawn/syncdecilk[mf98](prochedecelledeopenmp)auxblocsparallelesdealgol68etducompilateurearth-c[htz+97].commedans[mf98],lessynchronisations portentsurtouteslesactivitesasynchronescommenceesdansleblocenglobant,etdes synchronisationsimplicitessontajouteesauxpointsderetourdesprocedures.encequi concernel'exempledelagure3,l'executiondea,betcenparallelesuiviesequentiellementdedpuisdeeaeteecritedansunesyntaxealacilk.enpratique,chaque instructiondecetexempleseraitprobablementunappeldeprocedure.
19 18 PRESENTATIONENFRANCAIS... intx; s1x=; s2if()x=; r=x; intx1,x2; s1x1=; s2if()x2=; r=(fs1;s2g); Apresexpansion,onnepeutpasdecideralacompilationquelleestlavaleurluepar l'instructionr.onnesaitseulementquecelle-cinepeutvenirquedes1oudes2,etle calculdecettevaleurestcachedansl'expression(fs1;s2g).celle-ciobservesis2aete executee,siouielleretournelavaleurdex2,sinoncelledex1. inta[10]; s1a[i]=; s2a[]=; r=a[i]; inta1[10],a2[10]; s1a1[i]=; s2a2[]=; r=(fs1;s2g); Apresexpansion,onnesaitpasalacompilationquelleestlavaleurlueparl'instruction r,puisquel'onneconna^tpasl'elementdutableauaecritparl'instructions2....figure2.restaurationduotdedonneesal'execution spawna; spawnb; spawnc; sync;//attentedelaterminaisondea,betc D; E;...Figure3.Syntaxeduparallelismedecontr^ole... Ladeuxemevisionestexploiteleparallelismededonnees,c'est-a-direleparallelisme entredesinstancesdierentesdelam^emeinstructionoudum^emebloc.lemodelea parallelismededonneesaetelonguementetudiedanslecasdesnidsdeboucles[pd96], enraisondesonadequationaveclestechniquesecacesdeparallelisationpourlesalgorithmesnumeriquesetpourlesoperationsrepetitivessurdegrosjeuxdedonnees. OnutiliseraunesyntaxesimilairealadeclarationdebouclesparallelesenOpenMP,ou touteslesvariablessontsupposeespartageespardefaut,etunesynchronisationimplicite estajouteealandechaquesortiedeboucle. Pourgenererducodeaparallelismededonnees,beaucoupd'algorithmesutilisentdes transformationsdebouclesintuitivescommelassion,lafusion,l'echange,lerenversement,latorsion,lareindexationdebouclesetlereordonnancementdesinstructions.mais leparallelismededonneesestegalementadapteal'expressiond'unordred'execution parallelesousformed'ordonnancement,c'est-a-direenaectantunedated'executiona chaqueinstanced'uneinstruction.leschemadeprogrammedelagure4montredonne uneideedelamethodegeneralepourimplementeruntelordonnancement[pd96].le conceptdefrontd'executionf(t)estfondamentalpusiqu'ilrassembletouteslesinstances {quis'executentaladatet. Lepremieralgorithmed'ordonnancementestd^uaKennedyetAllen[AK87],lequela
20 I.INTRODUCTION for(t=0;t<=l;t++){//lestlalatencedel'ordonnancement }//synchronisationimplicite parallelfor({2f(t)) executeinstance{... Figure4.Implementationclassiqued'unordonnancementdanslemodeleaparallelisme dedonnees inspiredenombresmethodes.ellessefondenttoutessurdesabstractionsrelativementapproximativesdesdependances,commelesniveaux,lesvecteursetlesc^onesdedependance. Lacomplexiteraisonableetlafacilited'implementationdansuncompilateurindustriel utile,maisl'absencedesupportpourdeciderduparametredel'ordonnancementquel'on plusrecemmentdedarteetvivien[dv97]donnentunevisionglobaledecesalgorithmes. UnesolutiongeneraleaeteproposeeparFeautrier[Fea92].L'algorithmeproposeesttres constituentlesavantagesprincipauxdecesmethodes;lestravauxdebanerjee[ban92]et doitoptimiserconstitueunpointfaible:est-celalatencel,lenombredecommunications (surunemachineamemoiredistribuee),lalargeurdesfronts? tantplusvraipourlesprogrammesrecursifsouladistinctionentrelesdeuxparadigmes reecritdansunmodeleaparallelismedecontr^ole,sanspertedeparallelisme.c'estd'aurallelismededonnees,encesensquetoutprogrammeaparallelismededonneespeut^etre Pournir,ilestbienconnuqueleparallelismedecontr^oleestplusgeneralquelepa- n'estpastresclaire[fea98].enrevanche,pourdesprogrammesetdesarchitecturesreels, parallele principalementenraisondusurco^utassociealagestiondesactivites.des avanceesrecentesdanslematerieletleslogicielsontpoutantmontrequelasituationest leparallelismededonneesalongtempsetenettementplusadapteaucalculmassivement exemple[mf98]. lationsdejeuxcommelesechecs,etalgorithmesdetri)onteteobtenusaveccilkpar entraind'evoluer:d'excellentsresultatspourdesprogrammesparallelesrecursifs(simu- etentdanslessectionssuivantes.lasectionii resumantlechapitre2 decritun I.3Quatrechapitresstructurentcettetheseavantlaconclusionnale,etceux-cisere- Organisationdecettethese formalismegeneralpourl'analyseetlatransformationdeprogrammes,etpresenteles denitionsutilesauxchapitressuivants.lebutestd'^etrecapabled'etudierunelarge classedeprogrammes,desnidsdebouclesavectableauxauxprogrammesetstructures dedonneesrecursifs. pitre3;certainssontbienconnus,commel'arithmetiquedepresburgeretlatheoriedes langagesformels;certainssontplut^otpeucourantsdanslesdomainesduparallelisme etdelacompilation,commelestransductionsrationnellesetalgebriques;etlesautres DesresultatsmathematiquessontrassemblesdanslasectionIII resumantlecha- lestechniquesd'approximationpourtransductionsrationnellesetalgebriques. sontprincipalementdescontributions,commelestransductionssynchronesagaucheet
21 20LasectionIV resumantlechapitre4 s'attaqueal'analysedeparinstancesde PRESENTATIONENFRANCAIS programmesrecursifs.celle-ciestfondeesuruneextensiondelanotiondevariabled'inductionauxprogrammesrecursifsetsurdenouveauxresultatsentheoriedeslangages formels.deuxalgorithmespourl'analysededependanceetdedenitionvisiblesontproposes.ceux-cisontexperimentessurdesexemples. conditionnelles,debornesdebouclesetd'indexdetableaux;laquatriemesous-section sententdestechniquespourexpanserlesnidsdebouclessansrestrictiond'expressions l'objetdelasectionv resumantlechapitre5.lestroispremieressous-sectionspre- Lestechniquesdeparallelisationfondeessurl'expansiondelamemoireconstituent estunecontributional'optimisationsimultaneedesparametresd'expansionetdeparallelisation;etlacinquiemesous-sectionpresentenosresultatssurl'expansionetla II parallelisationdeprogrammesrecursifs. these,nouspresentonsuncadregeneralpourdecriredesanalysesetdestransformations Andeconserverunformalismeetunvocabulaireconstanttoutaulongdecette Modeles deprogrammes.nousavonsmisl'accentsurlarepresentationdesproprietesdeprogrammesauniveaudesinstances,toutenmaintenantunecertainecontinuiteavecles [KU77,CC77,JM82,KS92]:l'objectifprincipalconsisteaetablirdesresultatsconvaincantssurlapertinenceetl'ecacitedenostechniques. autrestravauxdudomaine.nousnecherchonsaconcurrenceraucunformalismeexistant notionsd'analyseetdetransformationdecode. Nousdecrivonsensuitelesabstractionsmathematiquesassociees,avantdeformaliserles programme,nousdenissonsunmodeledeprogrammespourlerestedecetteetude. Apresunepresentationformelledesinstancesd'instructionsetdesexecutionsd'un II.1 defois,acausedesstructuresdecontr^oleenglobantes.pourdecrirelesproprietesdu Aucoursdel'execution,chaqueinstructionpeut^etreexecuteeuncertainnombre Unevisionparinstances uneinstancedesal'executionestuneexecutionparticulieredesaucoursdel'execution duprogramme.danslecasdesnidsdeboucles,onutilisesouventlescompteursdeboucles distinguerentrecesdierentesexecutionsd'unem^emeinstruction.pouruneinstructions, otdedonneesaussiprecisementquepossible,nostechniquesdoivent^etrecapablesde generaldenommageseraetudiedanslasectionii.3. pournommerlesinstances,maiscettetechniquen'estpastoujoursapplicable:unschema avecleurenvironnement,plusieursexecutionsdum^emecodesontdoncassocieesades ensemblesd'instancesdierentsetadesproprietesduotincompatibles.nousn'aurons pasbesoinicid'undegreelevedeformalisation:uneexecutioned'unprogrammepest Lesprogrammesdependentparfoisdel'etatinitialdelamemoireetinteragissent donneeparunetraced'executiondep,c'est-a-direunesequencenieouinnie(lorsquele lesexecutionspossiblesestnotee.pourunprogrammedonne,onnoteiel'ensemble programmeneterminepas)decongurations(etatsdelamachine).l'ensembledetoutes desinstancesassocieesal'executione2e.enplusderepresenterl'execution,l'indicee rappellequel'ensembleieest(exact):cen'estpasuneapproximation. rencesalamemoire,l'uned'entreellesetanteventuellementuneecriture(c.-a-d.en Bienentendu,chaqueinstructionpeutcomporterplusieurs(ycompriszero)refe-
22 II.MODELES partiegauche).uncouple({;r)constitued'uneinstanced'instructionetd'unereference 21 leslectures,c.-a-d.lesacceseectuantuneoperationdelectureenmemoire;etwe, l'ensembledetouteslesecritures,c.-a-d.lesacceseectuantuneoperationd'ecritureen l'ensembledetouslesaccesestnoteae.ilsepartitionneen:re,l'ensembledetoutes dansl'instructionestappeleunacces.pouruneexecutiondonneee2ed'unprogramme, gauche,onconfondsouventlesaccesenecritureassociesetlesinstancesdel'instruction. memoire.danslecasd'uneinstructioncomportantunereferencealamemoireenpartie II.2 desextensionssyntaxiquesdec++).lespointeurssontautorises,etlestableauxaplusieursdimensionssontaccedesaveclasyntaxe[i1,:::,in] cen'estpaslasyntaxedu Nosprogrammesserontecritsdansunstyleimperatif,avecunesyntaxealaC(avec Modeledeprogrammes comptelespointeursdefonction[cou81,deu90,har89,afl95].lesappelsrecursifs,les premierordre,maisdestechniquesd'approximationpermettentdeprendreegalementen C pourfaciliterlalecture.cetteetudes'interesseprincipalementauxstructuresdu supposeenrevanchequelesgotoonteteprealablementeliminespardesalgorithmesde restructurationdecode[asu86,bak77,amm92]. boucles,lesinstructionsconditionnelles,etlesmecanismesd'exceptionsontautorises;on entiers,ottants,pointeurs...),lesenregistrements(ourecords)descalairesnonrecursifs, lesarbresdetableauxetlestableauxd'arbres(m^emecha^nesrecursivement).poursim- lestableauxdescalairesoud'enregistrements,lesarbresdescalairesoud'enregistrements, Nousneconsidereronsquelesstructuresdedonneessuivantes:lesscalaires(booleens, plier,noussupposonsquelestableauxsonttoujoursaccedesavecleursyntaxespeci- n'estpasevidentdesavoirsitellestructureestunelisteouunarbreetnonungraphequel- d'arbressontaccedeesal'aidedepointeursexplicites(atraverslesoperateurs*et->). que(l'operateur[])etquel'arithmetiquedepointeursestdoncinterdite.lesstructures conque.desinformationssupplementairesdonneesparleprogrammeurpeuventresoudre leprobleme[ks93,fm97,mic95,hhn92],dem^emequedesanalysesalacompilation La(forme)desstructuresdedonneesn'estpasexplicitedanslesprogrammesC:il ticulierdel'analysed'alias[deu94,cbc93,gh95,lrz93,egh94,ste96].parlasuite, noussupposeronsquedetellestechniquesonteteappliqueesparlecompilateur. delaformedesstructuresdedonnees[gh96,srw96].l'associationdespointeursaune instancedonneed'unestructured'arbren'estpasevidentenonplus:ils'agitd'uncaspar- chaquedepassementdebornes(c'estlecasdanslasectionv);enrevanche,lesstructuresa ment,maisilarrivequel'onaitrecoursadestableauxdynamiquesdontlatailleevoluea construites,modieesetdetruites?laformedestableauxestsouventconnuestatique- Unequestionimportanteaproposdesstructuresdedonnees:commentsont-elles innie.lacorrectiond'unetelleabstractionestgarantielorsquel'oninterdittouteinsertionettoutesuppressional'execution.cetteregletresstrictesouretoutdem^emedeux aetudieleproblemedans[fea98]etnousauronslam^emevision:touteslesstructures dedonneessontsupposeesconstruitesjusqu'aleurextensionmaximale eventuellement basedepointeurssontalloueesdynamiquementavecdesinstructionsexplicites.feautrier exceptionsquenousetudieronsapresavoirintroduitl'abstractionmathematiquepourles structuresdedonnees.iln'enrestepasmoinsquedenombreuxprogrammesnerespectent malheureusementpascetteregle.
23 22 II.3 Modeleformel PRESENTATIONENFRANCAIS puisnousproposonsuneabstractionmathematiquedescellulesmemoire. Nommerlesinstancesd'instructions Nouspresentonsd'abordunemethodedenommagepourlesinstancesd'instructions, etiquettes,lapremiererepresentel'entreedanslaboucle,ladeuxiemecorrespondala quettesestnotectrl.lesbouclesmeritentuneattentionparticuliere:ellesonttrois vericationdelacondition,etlatroisiemerepresentel'iteration1.delam^ememaniere, Desormais,onsupposequechaqueinstructionporteuneetiquette,l'alphabetdeseti- lesinstructionsconditionnellesontdeuxlabels:unpourlaconditionetpourlabranche then,unautrepourlabrancheelse.nousetudieronsl'exempledelagure5;cette procedurecalculetouteslessolutionsduproblemedesnreines.... PvoidQueens(intn,intk){ IintA[n]; B=B=b A=A=afor(inti=0;i<n;i++){ if(k<n){ rjs if(){ for(intj=0;j<k;j++) Q =A[j]; }} A[k]=; } Queens(n,k+1); sj F}intmain(){ FPIAAaAaAJs sj sj P Q IAA } Queens(n,0); FPIAAaAaAJQPIAABBr JrFB Ellessontgeneralementdeniescommeunchemindel'entreedugraphedeotdecontr^ole...Figure5.LaprocedureQueensetunarbredecontr^ole(partiel)... jusqu'auneinstructiondonnee.2chaqueexecutiond'uneinstructionestenregistree,y comprislesretoursdefonctions.dansnotrecas,lestracesd'executionontuncertain Lestracesd'executionsontsouventutiliseespournommerlesinstancesal'execution. nombred'inconvenients,leplusgraveetantqu'uneinstancedonneepeutavoirplusieurs tracesd'executiondierentesenfonctiondel'executionduprogramme.cepointinterdit utiliseuneautrerepresentationdel'executionduprogramme[cc98,coh99a,coh97, l'utilisationdestracespourdonnerununiquenomachaqueinstance.notresolution Fea98].Pouruneexecutiondonnee,chaqueinstanced'uneinstructionsesitueal'extremite 2.Sanssesoucierdesexpressionsconditionnellesetdesbornesdeboucles. 1.EnC,lavericationsefaitjusteapresl'entreedanslaboucleetavantchaqueiteration
24 II.MODELES d'uneuniqueliste(ordonnee)d'entreesdeblocs,d'iterationsdebouclesetd'appelsde 23 procedures.achaquelistecorresponduncertainmot:laconcatenationdesetiquettesdes donneeulterieurement. Denition1L'automatedecontr^oled'unprogrammeestunautomatenidontlesetats instructions.cesconceptssontillustressurl'arbredelagure5,dontladenitionest sontlesinstructionsetouunetransitiond'unetatqaunetatq0exprimequel'instructionq0appara^tdansleblocq.unetelletransitionestetiqueteeparq0.l'etatinitial estlapremiereinstructionexecutee,ettouslesetatssontnaux. construction,ilsdecriventunlangagerationnellctrlinclusdansctrl. SiIestl'uniondetouslesensemblesd'instancesIepourtouteexecutiondonneee2E, Lesmotsacceptesparl'automatedecontr^olesontappelesmotsdecontr^ole.Par nouspermetdeparlerdu(motdecontr^oled'uneinstance).engeneral,lesensembles ilyauneinjectionnaturelledeisurlelangagelctrldesmotsdecontr^ole.ceresultat estenbijectionavecl'ensembledesmotsdecontr^ole.nousparleronsdoncegalementde considereronssouventl'ensembledetouteslesinstancessusceptiblesd'^etreexecutees, EetIe pouruneexecutiondonneee nesontpasconnusalacompilation.nous independammentdesinstructionsconditionnellesetdesbornesdeboucles.cetensemble (l'instancew),quisignie(l'instancedontlemotdecontr^oleestw). contr^ole.lesautomatesduprogrammequeenssontdecritssurlagure6. cesetatssontelimines.cettetransformationn'apasdeconsequencessurlesmotsde sortante.enpratique,onconsideresouventunautomatedecontr^olecompresseoutous Onremarquequecertainsetatsn'ontqu'unetransitionentranteetunetransition... FF AA PP AA I I BB B r JJ s Q a r sqaa P PFP A IAA bb B r rbb J QP bb aa J ss Figure6.a.Automatedecontr^ole Figure6.b.Automatedecontr^olecompressepourQueens...Figure6.Automatesdecontr^ole... duprogramme:lesinstructionsd'unm^emeblocsontordonneesselonleurapparition,et quel'onnote<seq.deplus,onpeutdenirunordretextuelpartiel<txtsurlesinstructions L'ordred'executionsequentield'unprogrammedenitunordretotalsurlesinstances
25 24 lesinstructionsapparaissantdansdesblocsdierentssontincomparables.danslecas PRESENTATIONENFRANCAIS desboucles,l'etiquettedel'iterations'executeaprestouteslesinstructionsducorpsde conditionnelles).parconstructiondel'ordretextuel,uneinstance{0s'executeavantune note<lex.cetordreestpartielsurctrletsurlctrl(notammentacausedesinstructions boucle.pourlaprocedurequeensonab<txtj<txta,r<txtbets<txtq.cetordre instance{sietseulementsileursmotsdecontr^olew0etwrespectifsverientw0<lexw. textuelengendreunordrelexicographiquesurlesmotsdecontr^ole(ordredudictionnaire) brancheissuedelaracine.untelarbreestappelearbredecontr^ole.unarbred'appel nudcorrespondalorsaumotdecontr^oleegalalaconcatenationdesetiquettessurla dontlaracineestnommee"etchaquear^eteestetiqueteeparuneinstruction.chaque Enn,lelangagedesmotsdecontr^oles'interpretefacilementcommeunarbreinni, partielpourleprogrammequeensestdonneparlagure5. L'adressagedescellulesmemoire precedemment[cc98,coh99a,coh97,fea98,ccg96].celui-cis'inspireegalementd'approchesassezdiverses[ala94,mic95,deu92,lh88]. Nousgeneralisonsiciuncertainnombredeformalismesquenousavionsproposes binaireestlr.l'ensembledesnomsd'ar^etesestnotedata;ladispositiondesarbresen delaracine.l'adressedelaracineestdonc"etcelledunudroot->l->rdansunarbre d'entiers.l'adressagedesarbressefaitenconcatenantlesetiquettesdesar^etesenpartant Sanssurprise,leselementsdetableausontindexespardesentiersoudesvecteurs partagentlam^emeabstractionmathematique:lemonode(voirsectioniii.2).eneet, leslangagesrationnels(adressagedesarbres)sontdessous-ensemblesdemonodeslibres memoireestdoncdecriteparunlangagerationnelldatadata. aveclaconcatenationdesmots,etlesensemblesdevecteursd'entiers(indexationdes Pourtravailleralafoissurlesarbresetsurlestableaux,onnotequecesdeuxstructures monodeassocieauxelementsvalidesdelastructureseranoteldata. tableaux)sontdesmonodescommutatifslibresavecl'additiondesvecteurs.l'abstraction d'unestructurededonneesparunmonodeestnoteemdata,etlesous-ensembledece revelel'expressivitedesabstractionssousformedemonodes.toutefois,nousneparleronspasdavantagedecesstructureshybridesdansceresumeenfrancais.parlasuite, Lecasdesembo^tementsd'arbresetdetableauxestunpeupluscomplexe,maisil l'abstractionpourn'importequellestructurededonneesdenotremodeledeprogrammes seraunsous-ensembleldatadumonodemdataaveclaloi. precedente.notreformalismeestcapableenrealitedegererlesdeuxexceptionssuivantes: debutduprogrammeouencoursd'execution,lesinsertionsenqueuedelisteetaux puisqueleotdesdonneesnedependpasdufaitquel'insertiond'unnuds'eectueau Ilesttempsderevenirsurl'interdictiondesinsertionsetdessuppressionsdelasection feuillesdesarbressontpermises;lorsquedessuppressionssonteectueesenqueuede Nidsdebouclesettableaux risquedeconduireadesapproximationstropconservatrices. listeouauxfeuillesdesarbres,l'abstractionmathematiqueesttoujourscorrectemais oumultimedia.enormementderesultatsd'analyseetdetransformationonteteobtenus pourcesprogrammes.notreformalismedecritsansproblemecegenredecodes,maisil bouclessurtableaux,notammententraitementdusignaletdanslescodesscientiques Denombreusesapplicationsnumeriquessontimplementeessousformesdenidsde
26 II.MODELES sembleplusnatureletplussimpledereveniradesnotionsplusclassiquespournommer 25 lesmotsdecontr^ole,carlesz-modulesontunestructurebeaucoupplusrichequecelle desimplesmonodescommutatifs. lesinstancesetadresserlamemoire.eneet,lesvecteursd'entierssontplusadaptesque equivalentesenl'absenced'appelsdeprocedures.enn,lesinstancesd'instructionsnese sontuneinterpretationparticulieredesmotsdecontr^ole,etquelesdeuxnotionssont d'iterations leformalismeclassiquepournommerlesinstancesdanslesnidsdeboucles EnutilisantdescorrespondancesdeParikh[Par66],nousavonsmontrequelesvecteurs l'instancedel'instructionsdontlevecteurd'iterationestx;hs;x;refirepresentel'acces construitapartirdel'instancehs;xietdelareferenceref. reduisentpasuniquementadesvecteursd'iteration,etnousintroduisonslesnotations suivantes(quigeneralisentlesnotationsintuitivesdelasectionii.1):hs;xirepresente danslasectioniv.5. D'autrescomparaisonsentrevecteursd'iterationetmotsdecontr^olesontpresentees modeleutilisedesmotsdecontr^oleetnondestracesd'execution.nouspreferonsiciutiliser II.4Ladenitiondesexecutionsd'unprogrammen'estpastrespratiquepuisquenotre Analyseparinstances dependpasdel'execution,l'ordresequentieletantdeterministe.aucontraire,ledomaine feassociechaqueaccesalacellulememoirequ'illitouecrit.onremarqueque<seqne (<seq;fe),ou<seqestl'ordred'executionsequentielsurtouteslesinstancespossibleset unevisionequivalenteoul'executionsequentiellee2ed'unprogrammeestuncouple defeestexactementl'ensembleaedesaccesassociesal'executione.lafonctionfe estappeleelafonctiond'accespourl'executioneduprogramme[cc98,fea98,cfh95, Coh99b,CL99].Poursimplier,lorsquel'onparleradu(programme(<seq;fe)),on entendral'ensembledesexecutions(<seq;fe)duprogrammepoure2e. Conitsd'accesetdependances enecriture alam^emecellulememoire:fe(a)=fe(a0). entreaccesalamemoire.deuxaccesaeta0sontenconits'ilsaccedent enlectureou L'analysedesconitsressemblebeaucoupal'analysed'alias[Deu94,CBC93]ets'appliqueegalementauxanalysesdecaches[TD95].Larelationdeconit larelationentre uneapproximationconservatricedelarelationdeconitquisoitcompatibleavecn'immentpasconna^treexactementfeete,l'analysedesconitsd'accesconsisteadetermineportequelleexecutionduprogramme: conitsd'acces estnoteeepouruneexecutiondonneee.commeonnepeutgenerale- Lesanalysesettransformationsrequierentsouventdesinformationssurles(conits) s'executentdansunordrequelconque.cesconditionss'exprimententermededependances:unaccesadependd'unautreaccesa0sil'und'entreeuxestuneecriture,s'ils Pourparalleliser,onabesoindeconditionssusantespourautoriserquedeuxacces 8e2E;8v;w2Ae: fe(v)=fe(w)=)vw: sontenconit fe(a)=fe(a0) etsia0s'executeavanta a0<seqa.larelationde dependancepouruneexecutioneestnoteee:adependdea0estnotea0ea. 8e2E;8a;a02Ae: a0eadef ()(a2we_a02we)^a0<seqa^fe(a)=fe(a0):
27 26 Uneanalysededependancessecontenteanouveaud'unresultatapproche,telque PRESENTATIONENFRANCAIS Analysededenitionsvisibles 8e2E;8a;a02Ae: a0ea=)a0a: etantdonneunelectureenmemoire,onveutconna^trel'instancequiaproduitlavaleur. denitionvisible.ils'agitenfaitdeladerniereinstance selonl'ordred'execution en L'accesenlectureestappeleutilisationetl'instancequiaproduitlavaleurestappelee Danscertainscas,onrechercheuneinformationplusprecisequelesdependances: accesenlectureestnoteee: dependanceavecl'utilisation.lafonctionassociantsonuniquedenitionvisibleachaque programmeconsidere.onajoutedoncuneinstancevirtuelle?quis'executeavanttoutes Ilsepeutqu'uneinstanceenlecturen'aitenfaitaucunedenitionvisibledansle 8e2E;8u2Re: e(u)=max <seqv2we:veu : approximedemaniereconservatricelesfonctionse: lesinstancesduprogrammeetinitialisetouteslescellulesmemoire. Lorsquel'oneectueuneanalysededenitionsvisibles,oncalculeunerelationqui Onpeutaussivoircommeunefonctionquicalculedesensemblesdedenitionsvisiblespossibles.Lorsque?appara^tdansunensmbled'instances,unevaleurnoninitialiseces:OnadejarencontrelanotationIquirepresentel'ensembledetouteslesinstances possiblespourn'importequelleexecutiond'unprogrammedonne: Parlasuiteonaurabesoindeconsidererdesensemblesapprochesd'instancesetd'ac- 8e2E;8u2Re;v2We: v=e(u)=)vu: risqued'^etrelue.cetteinformationpeut^etreutiliseepourverierlesprogrammes. Dem^eme,onutiliseralesapproximationsconservatricesA,RetWdesensemblesAe, ReetWe. 8e2E: {2Ie=){2I; construireunprogramme(<par;fexp II.5AveclemodeleintroduitparlasectionII.4,paralleliserunprogramme(<seq;fe)signie Parallelisation deproprietesdoivent^etresatisfaitespar<paretfexp construireunenouvellefonctiond'accesfexp direunordrepartieletunsousordrede<seq.onappelleexpansiondelamemoirelefaitde e),ou<parestunordred'executionparallele,c'est-a- l'executionsequentielle. e apartirdefe.biens^ur,uncertainnombre quisontduesalareutilisationdesm^emescellulesmemoire.indirectement,l'expansionmet L'expansiondelamemoireapourbutdereduirelenombrededependancessuperues e andepreserverlasemantiquede doncenevidenceplusdeparallelisme.onconsidereeneetunerelationdedependance exp epouruneexecutioneduprogrammeexpanse: 8e2E;8a;a02Ae: a0exp eadef ()(a2we_a02we)^a0<seqa^fexp e(a)=fexp e(a0):
28 III.OUTILSMATHEMATIQUES Pourdenirunordreparallelecompatibleavecn'importequelleexecutionduprogramme,ondoitconsidereruneapproximationconservatriceexp.Cetteapproximation 27 Theoreme1(correctiond'unordreparallele)Laconditionsuivantegarantitque estengeneraleinduiteparlastrategied'expansion(voirsectionv.4parexemple). l'ordred'executionparalleleestcorrectpourleprogrammeexpanse(ilpreservela semantiqueduprogrammed'origine). unique.onsupposeradoncqueexp=pourparalleliserdetelsprogrammes. Onremarquequeexp 8({1;r1);({2;r2)2A:({1;r1)exp({2;r2)=){1<par{2: unordred'executionparallele,etpourgenererlecodecorrespondant.lestechniquesde Enn,onnereviendrapasicisurlestechniquesutiliseespourcalculereectivement e coincideavecelorsqueleprogrammeestmisenassignation lasection5.5.encequiconcernelesmethodesassocieesauxnidsdeboucles,denombreux parallelisationdeprogrammesrecursifssontrelativementrecentesetserontetudieesdans algorithmesd'ordonnancementetdepartitionnement oudepavage(tiling) ontete proposes;maisleurdescriptionnepara^tpasindispensablealabonnecomprehensiondes techniquesetudieesparlasuite. III mathematiquesquenousutilisons.lelecteurinteresseparlestechniquesd'analyseetde Cettesectionrassemblelesrappelsetlescontributionsportantsurlesabstractions Outilsmathematiques III.1 transformationpeutsecontenterdenoterlesdenitionsettheoremesprincipaux. vecteursd'entiers.l'arithmetiquedepresburgernousconvientparticulierementpuisque Nousavonsbesoindemanipulerdesensembles,desfonctionsetdesrelationssurdes ArithmetiquedePresburger laplupartdesquestionsinteressantessontdecidablesdanscettetheorie.onladenit completdeprogrammationlineaireennombresentiers[sch86].lesalgorithmesutilisessont contraintesanesentieres.lasatisfactiond'uneformuledepresburgerestaucurde laplupartdescalculssymboliquesavecdescontraintesanes:c'estunproblemenp- apartirdesformuleslogiquesconstruitessur8,9,:,_,^,l'egaliteetl'inegalitede pratiquesurdesproblemesdetaillemoyenne. super-exponentielsdanslepirecas[pug92,fea88b,fea91],maisd'unegrandeecacite mationlineaireennombreentiers utiliseuneautrerepresentationpourlesrelations tationsdeprototypes;lasyntaxedesensembles,relationsetfonctionsetanttresproche desnotationsmathematiquesusuelles.pip[fea88b] l'outilparametriquedeprogram- NousutilisonsprincipalementOmega[Pug92]dansnosexperimentationsetimplemen- Denition2(quast)Unquastrepresentantunerelationaneestuneexpressionconditionnelleaplusieursniveaux,danslaquellelespredicatssontdestestssurlesignede anes:lanotiond'arbredeselectionquasi-aneouquasi-aneselectiontree,plussimplementappelequast. desrestesdetellesdivisions. 3.Lesformesquasi-anesetendentlesformesanesavecdesdivisionsentierespardesconstanteset formesquasi-anes3etlesfeuillessontdesensemblesdevecteursdecritsdansl'arith-
29 28metiquedePresburgeretendueavec? quiprecedetoutautrevecteurpourl'ordre PRESENTATIONENFRANCAIS lexicographique. serontdonnesdanslasectionv. f?getdecriventlesvecteursquinesontpasdansledomainedelarelation.desexemples Uneoperationclassiquesurlesrelationsconsisteadeterminerlacl^oturetransitive.Les Lorsquedesensemblesvidesapparaissentdanslesfeuilles,ilsdierentdusingleton algorithmesclassiquesneconsiderentquedesgraphesnis.malheureusement,danslecas desrelationsanes,ilsetrouvequelacl^otured'unerelationanen'enestgeneralement implementeesdansomega[kprs96].l'ideegeneraleconsisteaserameneraunesousclasseparapproximation,puisdecalculerexactementlacl^oture. Nousutiliseronsdoncdestechniquesd'approximationdeveloppeesparKellyetal.et pasune. monodes,leslangagesrationnelsetalgebriques,lesautomatesnis,etlesautomatesapile. III.2 Certainsconceptsfontpartiedufondcommuneninformatiquetheorique,commeles Langagesformelsetrelationsrationnelles desobjetsmathematiquesplusoriginaux:nouspresenteronslesresultatsessentielssurla parlasuite,al'aided'unexempleclassique.dansundeuxiemetemps,nousetudierons introductionsenfrancais.nousnouscontenteronsdoncdexerlesnotationsutilisees Lesouvragesdereferencesont[HU79]et[RS97a],maisilexisteegalementdenombreuses Langagesformels:exempleetnotations classedesrelationsrationnellesentremonodesdetypeni. a-d.reconnuparunautomateauncompteur sous-classedeslangagesalgebriques. LelangagedeLukasiewicz-Lsurunalphabetfa;bgestengendreparl'axiomeetla grammairedontlesproductionssont!ajb: LelangagedeLukasiewiczestunexemplesimpledelangageauncompteur c.- CelangageestapparenteauxlangagesdeDyck[Ber79],sespremiersmotsetant sontutilises,zestlesymboledefonddepile,icodelesnombrespositifs,etdlescode L'encodaged'uncompteursurunepilesefaitdelafaconsuivante:troissymboles b;abb;aabbb;ababb;aaabbbb;aababbb;::: nombresnegatifs;zinrepresentedoncl'entiern,zdnrepresente n,etzcodelavaleur sieurs:ils'agitalorsd'unemachinedeminsky[min67].cependant,lesautomatesadeux 0ducompteur.Lagure7decritunautomateapileacceptantlelangage-Lainsiqueson interpretationentermesdecompteur. compteursontdejalem^emepouvoird'expressionquelesmachinesdeturing,etlaplupart desquestionsinteressantesdeviennentdoncindecidables.pourtant,enimposantquelques Unegeneralisationnaturelledeslangagesauncompteurconsisteaenmettreplu- recentsonteteobtenus.l'etudedecesobjetsparaitricheenapplications,notamment restrictionssurlafamilledeslangagesaplusieurscompteurs,desresultatsdedecidabilite danslecasdestravauxdecomonetjurski[cj98].
30 III.OUTILSMATHEMATIQUES... 29!Zb;I!" a;i!ii1 ";Z!Z a;z!zi 2!1 b;>0; 1";=0 1 2 Figure7.a.Automateapile Figure7.b.Automateauncompteurassocie a;+1 Relationsrationnelles...Figure7.Exemplesd'automates... plusamplesdetails.soitmunmonode.unsous-ensemblerdemestunensemble reconnaissables'ilexisteamonodenin,unmorphismedemdansnetunsousensemblepdentelsquer= 1(P). Nousnouscontentonsdequelquesrappels;consulter[AB88,Eil74,Ber79]pourde pourlaconcatenation,maispaspourl'operationetoile.c'estlecasenrevanchedela l'intersectionetlecomplementaire.lesensemblesreconnaissablessontegalementclos d'algebrebooleenne:eneet,laclassedesensemblesreconnaissablesestclosepourl'union, Cesensemblesgeneralisentleslangagesrationnelstoutenconservantlastructure classedesensemblesrationnels,dontladenitionetendcelledeslangagesrationnels: soitmunmonode,laclassedesensemblesrationnelsdemestlapluspetitefamillede sous-ensemblesdemcomportant?etlessingletonsfmgm,closepourl'union,la concatenationetl'operationetoile. section.simestdelaformem1m2,oum1etm2sontdeuxmonodes,unsous-ensemble reconnaissabledemestappelerelationreconnaissable,etunsous-ensemblerationnelde Mestappelerelationrationnelle.Leresultatsuivantdecritla(structure)desrelations Engeneral,lesensemblesrationnelsnesontpasclospourlecomplementaireetl'inter- reconnaissables. Theoreme2(Mezei)UnerelationreconnaissableRM1M2estuneunionnie d'ensemblesdelaformeklouketlsontdesensemblesrationnelsdem1etm2. sablesetrationnelles.apartird'unerelationrentredesmonodesm1etm2,ondenit sontdesrelationsentremonodesdetypeni. Lestransductionsdonnentunevision(plusfonctionnelle)desrelationsreconnais- Parlasuitenousneconsidereronsquedesensemblesreconnaissablesetrationnelsqui classessontclosespourl'inversion,etlaclassedestransductionsreconnaissablesestegalementclosepourlacomposition. casdemonodeslibres:c'estletheoremedeelgotetmezei[em65,ber79],fondamental Celledestransductionsrationnellesestegalementclosepourlacompositiondansle rationnelle)ssisongrapheestunerelationreconnaissable(resp.rationnelle).cesdeux despartiesdem2,tellequev2(u)ssiurv.unetransductionestreconnaissable(resp. unetransductiondem1dansm2commeunefonctiondem1dansl'ensemblep(m2) pourl'analysededependances(sectioniv). Theoreme3(ElgotandMezei)SiA,BetCsontdesalphabets,1:A!Bet
31 302:B!Csontdestransductionsrationnelles,alors21:A!Cestune PRESENTATIONENFRANCAIS transducteurrationnel;ceux-cietendentnaturellementlesautomatesnisenajoutantun transductionrationnelle. (rubandesortie): Larepresentation(mecanique)desrelationsettransductionsrationnellesestappelee Denition3(transducteurrationnel)Pourunmonode(d'entree)M1etunmonode(desortie)M24,ondenituntransducteurrationnelT=(M1;M2;Q;I;F;E) nauxfq,etunensemblenidetransitions(ouar^etes)eqm1m2q. LetheoremedeKleeneassurequelesrelationsrationnellesdeM1M2sontexactementlesrelationsreconnuesparuntransducteurrationnel.OnnotejTjlatransduction avecunensemblenid'etatsq,unensembled'etatsinitauxiq,anensembled'etats Theoreme4Lesproblemessuivantssontdecidablespourlesrelationsrationnelles:estcequedeuxmotssontenrelation(entempslineaire),lavacuite,lanitude. AB,(AB) Restni,Restreconnaissable. deuxlettres.iln'estpasdecidabledesavoirsir\r0=?,rr0,r=r0,r= SoientRetR0deuxrelationsrationnellessurdesalphabetsAetBavecaumoins reconnueparletransducteurt:onditquetrealiselatransductionjtj.lorsqueles monodesm1etm2sontlibres,l'elementneutreestlemotvidenote". partielles.unefonctionrationnelle unefonctionpartielle,c.-a-d.tellequecard( Quelquesresultatsinteressantsconcernentlestransductionsquisontdesfonctions :M1!M2estunetransductionrationnellequiest unefonctionrationnelleestinclusedansuneautreetsiellessontegales. deuxalphabetsaetb,ilestdecidablequ'unetransductionrationnelledeadansb estunefonctionpartielle(eno(card(q)4)[ber79,bh77]).onpeutegalementdecidersi (u))1pourtoutu2m1.etantdonnes alphabets.untransducteurestsequentiellorsqu'ilestetiquetesurabetqueson automated'entree(obtenuenomettantlessorties)estdeterministe.untransducteur aceuxquel'onpeut(calculeralavolee)enlisantleurentree.soientaetbdeux Parmilestransducteursrealisantdesfonctionsrationnelles,ons'interessenotamment Denition4(transducteursous-sequentiel)PourdeuxalphabetsAetB,untransducteursous-sequentiel(T;)surABestuncoupleouTestuntransducteur sequentielrealiseunefonctionrationnelle.cettenotionde(calculalavolee)estunpeu troprestrictive,onconsidereplut^otl'extensionsuivante: Lafonction sequentielavecfpourensembled'etatsnaux,etou:f!bestunefonction. cecas denies'ilexisteunchemindanstacceptant(ujv)aboutissantaunetatnalq;dans (u)=v(q). realiseepar(t;)estdeniecommesuit:siu2a,lavaleur (u)est unautrepourdecidersiunesous-sequentielleestsequentielle.ilsontegalementpropose unalgorithmepolynomialpourdecidersiunefonctionrationnelleestsous-sequentielle,et Partantd'unedemonstrationdeChorut[Cho77],BealetCarton[BC99b]ontpropose End'autrestermes,ajouteunmotalandelasortied'untransducteursequentiel. rationnelle,lorsqu'elleexiste. unalgorithmepolynomialpourtrouverunerealisationsous-sequentielled'unefonction 4.LesmonodesM1etM2sontsouventomisdeladenition.
32 III.OUTILSMATHEMATIQUES III.3 Relationssynchronesagauche 31 indispensabledanslecadredel'analysededependances.feautrier[fea98]aproposeun derelationsrationnelles:l'algorithmenetermineacoups^urquelorsquel'intersectionn'est (semi-algorithme)pourrepondrealaquestionindecidabledelavacuited'uneintersection Lesrelationsrationnellesnesontpasclosespourl'intersection,maiscetteoperationest lecomplementaire). rationnellesavecunestructured'algebrebooleenne(c.-a-d.avecl'union,l'intersectionet pasvide.puisquenousvoulonscalculercetteintersection,nousadoptonsuneapproche dierente:onseramene parapproximationsconservatrices auneclassederelations etudieeindependammentparfrougnyetsakarocitch[fs93],maisnotrerepresentationest construituneclasseplusgenerale:lesrelationssynchronesagauche.cetteclasseaete dierente,lespreuvessontnouvellesetdenouveauxresultatsonteteobtenus.cetravail Lesrelationsreconnaissablesconstituentbienunealgebrebooleene,maisnousavons estleresultatd'unecollaborationavecoliviercarton(universitedemarne-la-vallee). longueurpourlesmotsd'entreeetdesortie:untransducteurrationnelsurdesalphabets AetBestsynchrones'ilestetiquetesurAB.Nousetendonscettenotiondelafacon suivante. Onrappelleunedenitionclassique,equivalentealaproprietedepreservationdela Denition5(synchronismeagauche)Untransducteurrationnelsurdesalphabels AetBestsynchroneagauches'ilestetiquetesur(AB)[(Af"g)[(f"gB) ^etrerealiseeparuntransducteursynchroneagauche.untransducteurrationnelest transitionsetiqueteessuraf"g(resp.f"gb). etseulesdestransitionsetiqueteessuraf"g(resp.f"gb)peuventsuivredes synchronisableagauches'ilrealiseunerelationsynchroneagauche. Unerelationouunetransductionrationnelleestsynchroneagauchesiellepeut realisentl'ordreprexeetl'ordrelexicographique(<txtestunordreparticuliersura).... Lagure8montredestransducteurssynchronesagauchesurunalphabetAqui Pourlestransducteurssuivants,xetyremplacentrespectivement8x2Aet8y2A. xjx 1 "jy "jy "jy Figure8.a.Ordreprexe 2 "jy5 1xjy;x<txtyxjy 2xj" "jy "jy xjx xj" Figure8.Exempledetransducteurssynchronesagauche... Figure8.b.Ordrelexicographique
33 32Ilestconnuquelestransducteurssynchronesconstituentunealgebrebooleenne5. PRESENTATIONENFRANCAIS Theoreme5Laclassedesrelationssynchronesagaucheconstitueunealgebrebooleenne:elleestclosepourl'union,l'intersectionetlecomplementaire.Deplus,les Restreconnaissable,alorsTRestsynchroneagauche.Enn,laclassedesrelations chroneagauche,alorsstestsynchroneagauche;sitestsynchroneagaucheet relationsreconnaissablessontsynchronesagauche;sisestsynchroneettestsyn- cen'estpaslecasdesrelationsreconnaissables[ber79]etnousavonsmontrequ'ilenest synchronesagaucheestclosepourlacomposition. dem^emedesrelationssynchronesagauche. Lesrelationssynchronessontdecidablesparmilesrelationsrationnelles[Eil74],mais SiTestuntransducteursynchroneagauche,lescyclesdeTnepeuventavoirquetrois nellepeut^etreprouveesynchroneagauche.aceteet,onrappellelanotiondetauxde transmissiond'uncheminetiquetepar(u;v):ils'agitdurapportjvj=juj2q+[f+1g. Ons'interessecependantacertainscasparticulierspourlesquelsunerelationration- 0peuventsuivreceuxdetaux0,etseulslescomposantsdetaux+1peuventsuivreceux tauxdetransmissionpossibles:0,1et+1.touslescyclesd'unem^emecomposantefortementconnexedoiventavoirlem^emetauxdetransmission,seulslescomposantsdetaux Theoreme6Siletauxdetransmissiondechaquecycled'untransducteurrationnelest detaux+1.ilexisteunereciproquepartielle: transducteurestsynchronisableagauche. Nouspouvonsdonc\resynchroniser"unecertaineclassedetransducteurssynchronisablesagauche,asavoirlestransducteurssatisfaisantleshypothesesdutheoreme6.En deresynchronisationpourcalculerdesapproximationssynchronesagauchederelations sefondantsurunalgorithmedebealetcarton[bc99a],onpeutecrireunalgorithme 0,1ou+1,etsiaucuncycledetaux1suituncycledetauxdierentde1,alorsle rationnelles.cettetechniqueserautiliseedanslasectioniii.5. Lemme1SoientRetR0desrelationssynchronesagauchesurdesalphabetsAetB.Il pendancesetdedenitionsvisibles. Nousterminonssurdesproprietesdedecidabilite,essentiellespourl'analysedede- synchronesagauche. Noustravaillonstoujourssurladecidabilitedesrelationsreconnaissablesparmiles estdecidablequer\r0=?,rr0,r=r0,r=ab,(ab) Restni. rationnelles.nousutiliseronsdonclanotionderelationalgebrique ouhors-contexte III.4 Nousavonsparfoisbesoind'unepuissanced'expressionsuperieureacelledesrelations Depasserlesrelationsrationnelles Denition6(transducteurapile)EtantdonnesdeuxalphabetsAetB,untransducteurapileT=(A;B; ;0;Q;I;F;E)estconstitued'unalphabetdepile 6, destransducteursapile: quietendnaturellementcelledelangagealgebrique.cesrelationssontdeniesapartir 5.Touteslesproprietesetudieesdanscettesectionontdespreuvesconstructives. 6.LesalphabetsAetBsontsouventomisdeladenition. unmotnonvide0dans +appelemotdepileinitial,unensemblenid'etatsq,un
34 III.OUTILSMATHEMATIQUES ensembleiqd'etatsinitiaux,unensemblefqd'etatsnaux,etunensemble 33 nidetransitions(ouar^etes)eqab Q. Denition7(relationalgebrique)Laclassedesrelationsrealiseespardestransducteursapileestappeleeclassedesrelationsalgebriques. quecelled'automateapilerealisantunlangage. Lanotiondetransducteurapilerealisantunerelationestdeniedelam^ememaniere tionsalgebriques. Theoreme7Lesrelationsalgebriquessontclosespourl'union,laconcatenationetl'operationetoile.Ellessontegalementclosespourlacompositionavecdestransductions Bienentendu,lestransductionsalgebriquesconstituentlavisionfonctionnelledesrela- rationnelles.l'imaged'unlangagerationnelparunetransductionalgebriqueestun langagealgebrique. deuxmotssontenrelation(entempslineaire),lavacuite,lanitude. Ilyatrespeuderesultatssurlestransductionsalgebriquesquisontdesfonctionspartielles,appeleesfonctionsalgebriques.Enparticulier,nousneconnaissonspasdesous-classe Lesquestionssuivantessontdecidablespourlesrelationsalgebriques:est-ceque auncompteur,realiseesparuntransducteurauncompteur denitionsemblableacelle decesfonctionsquisoit(calculablealavolee)ausensdesfonctionssous-sequentielles. d'unautomateauncompteur.onpeutegalementconsidererplusd'uncompteur,mais l'onobtientalorslam^emepuissanced'expressionquelesmachinesdeturing.cetteclasse Neanmoins,unesous-classeinteressantedesrelationsalgebriquesestcelledesrelations Theoreme8SoientAetBdeuxalphabetsetnunentierpositif.Si1:A!Znet entremonodesnonlibres(letheoremedeelgotetmezeines'appliqueplus). nousinteresselorsquenoussommesamenesacomposerdestransductionsrationnelles 2:Zn!Bsontdestransductionsrationnelles,alors21:A!Bestune transductionancompteurs. Proposition1SoientAetBdeuxalphabetsetnunentierpositif.Soient1:A!Zn Deplus,onpeutdeduireunresultatimportantdelapreuvedutheoreme: Cetheoremeserautilisepourl'analysededependances,principalementavecn=1. et2:zn!bdestransductionsrationnellesettuntransducteurancompteurs sous-jacentat obtenuenomettantlesmanipulationsdepile estreconnaissable. realisant21:a!b(calculeavecletheoreme8.alors,letransducteurrationnel Proposition2SoitR1unerelationalgebriquerealiseeparuntransducteurapiledont rationnelle,d'apresleresultatsuivant: Ceresultatgarantitlacl^oturepourl'intersectionavecn'importequelletransduction letransducteurrationnelsous-jacentestsynchroneagauche,etsoitr2unerelation untransducteurapilequilarealisedontletransducteurrationnelsous-jacentest synchroneagauche.alorsr1\r2estunerelationalgebrique,etonpeutconstruire auxembo^tementsd'arbresetdetableaux,quenousn'abordonspasdansceresume. Enn,letheoreme8s'etendauxmonodespartiellementcommutatifslibresassocies
35 34 III.5 Complementssurlesapproximations PRESENTATIONENFRANCAIS montronsicicomments'yramenerenappliquantdesapproximationsconservatrices. closespourcetteoperation;maisnousavonsidentiedessous-classesquilesont.nous detransformationdeprogrammes.lesrelationsrationnellesetalgebriquesnesontpas Lecalculd'intersectionesttresutilisedanslecadredenostechniquesd'analyseet coupled'unetatinitialetd'unetatnal,etpourchaquecomposantefortementconnexe. lasortie.destechniquesplusprecisesconsistentaeectuercetteoperationpourchaque reconnaissables.l'ideegeneraleconsisteaconsidererleproduitcartesiendel'entreeetde Plusieursmethodespermettentd'approcherdesrelationsrationnellespardesrelations Leresultatesttoujoursunerelationreconnaissable,gr^aceautheoreme2. resynchronisation,etdoncsurletheoreme6.lorsquel'algorithmeechoue,onremplace unecomposantefortementconnexeparuneapproximationreconnaissableetonrecommence.desoptimisationspermettentden'appliquerqu'uneseulefoisl'algorithmede L'approximationpardesrelationssynchronesagaucheestfondeesurl'algorithmede mentaires,soitonapproximeletransducteurrationnelsous-jacentparuntransducteur dedeuxmanieres:soitonapproximelapile oulescompteurs pardesetatssupple- resynchronisation. synchroneagauche.lesdeuxtechniquesserontutiliseesparlasuite. L'approximationderelationsalgebriques ouaplusieurscompteur peutsefaire IV cursifs[ccg96,coh97,coh99a,fea98,cc98],nouspresentonsuneevolutionmajeure Apresuncertainnombredetravauxsurl'analyseparinstancesdeprogrammesre- Analyseparinstancepourprogrammesrecursifs avecunformalismeplusgeneraletuneautomatisationcompleteduprocessus.audela del'objectiftheoriqued'obtenirlemaximumdeprecisionpossible,nousverronsdansla sationautomatiquedeprogrammesrecursifs. sectionv.5commentcesinformationspermettentd'ameliorerlestechniquesdeparalleli- Cettesectionseterminesurunecomparaisonaveclesanalysesstatiquesetaveclestravaux nouspresentonslesanalysesdedependancesetdedenitionsvisiblesproprementdites. recentsportantsurl'analyseparinstancesdenidsdeboucles. Enpartantd'exemplesreels,nousdiscutonsducalculdevariablesd'inductionpuis instancespourstructuresrecursives.untroisiemeexempleestpresentedanslathese, IV.1 Nousetudionsdeuxexemplespourdonnerunapercuintuitifdenotreanalysepar Exemplesintroductifs Premierexemple:leprogrammeQueens maisilutiliseunestructurehybrideentrearbresettableauxdontnousneparlonspasici. programmeestreproduitsurlagure9avecunarbredecontr^olepartiel. vonsparexemplel'instancefpiaaaaaajqpiaabbrdel'instructionr,representeeparune Nousetudionslesdependancesentrelesinstancesal'executiondesinstructions.Obser- NousconsideronsanouveaulaprocedureQueenspresenteedanslasectionII.3.Le etoilesurlagure9.b.lavariablejestinitialiseea0parl'instructionbetincrementee parl'instructionb,noussavonsdoncquelavaleurdejenfpiaaaaaajqpiaabbrest0;
36 IV.ANALYSEPARINSTANCEPOURPROGRAMMESRECURSIFS... inta[n]; 35 PvoidQueens(intn,intk){ ra=a=afor(inti=0;i<n;i++){ IB=B=bif(k<n){ Js if(){ for(intj=0;j<k;j++) Q =A[j]; A[k]=; IAAaAaA FP }}} Queens(n,k+1); F}intmain(){ FPIAAJs FPIAAaAJs FPIAAaAaAJs sss JJJ QP Figure9.a.ProcedureQueens } Queens(n,0); FPIAAaAaAJQPIAABBr ecriventa[0] J IAA FBB Figure9.b.Arbredecontr^ole(compresse) r lita[0] doncfpiaaaaaajqpiaabbrlita[0].observonsapresentlesinstancesdes,representees...figure9.laprocedurequeensetunarbredecontr^ole... ecriventdoncdansa[0],etsontainsiendependanceavecfpiaaaaaajqpiaabbr. pardescarres.lavariablekestinitialiseea0lorsdupremierappelaqueens,puiselleest incrementeeparl'appelrecursifq.lesinstancesfpiaajs,fpiaaaajsetfpiaaaaaajs gureanouveau,onremarquequel'instancefpiaaaaaajs lecarrenoir s'exe- cuteendernier.deplus,onpeutassurerquecetteinstanceestexecuteelorsquela LaquelledecesdenitionsatteintelleFPIAAaAaAJQPIAABBr?Enobservantla- lecturefpiaaaaaajqpiaabbrs'execute.lesautresecrituressontdoncecraseespar ronsulterieurementcommentgeneralisercetteapprocheintuitive. Deuxiemeexemple:leprogrammeBST FPIAAaAaAJsquiestainsiladenitionvisibledeFPIAAaAaAJQPIAABBr.Nousmontre- lavaleurentieredunud.ilyapeudedependancessurceprogramme:lesseulessont searchtree.lesnudsdel'arbresontreferencespardespointeurs,etp->valuecontient valeursdesnudspourconvertirunarbrebinaireenarbrebinairederecherche,oubinary ConsideronsapresentlaprocedureBSTdelagure10.Cetteprocedureechangeles I2.Parconsequent,l'analysededenitionvisibledonneunresultattressimple:laseule desanti-dependancesentrecertainesinstancesd'instructionsal'interieurdesblocsi1ou IV.2 denitionvisibledetoutaccesenlectureest?. auxcellulesmemoirequ'ilslisentouecrivent.nousavonsdesormaisbesoind'expliciterces OnadenitdanslasectionII.4lanotiondefonctiond'acces.Celle-cirelielesacces Relierinstancesetcellulesmemoire fonctions,etnousintroduisonspourcelalanotiondevariabled'induction.enpresencede
37 36 PvoidBST(tree*p){... PRESENTATIONENFRANCAIS LI2 I1 a if(p->l!=null){ if(p->value<p->l->value){ BST(p->l); cb } p->l->value=t; p->value=p->l->value; t=p->value; RJ1 if(p->r!=null){ } LP J2 ed if(p->value>p->r->value){ BST(p->r); f } p->r->value=t; p->value=p->r->value; t=p->value; I1 I1 PFPJ1RP I2 J1 }intmain(){ } aab I2 b cc ddj2 J2 eeff F...Figure10.ProcedureBSTetautomatedecontr^ole(compresse)... } if(root!=null)bst(root); ^etreredenie.poursimplierl'exposition,noussupposonsquechaquevariablepossede proceduresrecursives,cettenotionhistoriquementlieeauxnidsdeboucles[wol92]doit denitiondesvariablesd'inductionestlasuivante: unnomdistinctifunique;onpourraainsiparlersansambigutede(lavariablei).notre {lesargumentsentiersd'unefonctionquisontinitialisesparuneconstanteoupar {lescompteursdeboucleentierstranslatesd'uneconstanteachaqueiteration; unevariableentiered'inductionplusuneconstante,achaqueappelrecursif; {lesargumentsdetypepointeurquisontinitialisesparuneconstanteouparune programmedelasectionii.2:lesstructuresdedonneesanalyseesdoivent^etredeclarees L'analyserequiertuncertainnombred'hypothesessupplementairessurlemodelede variabled'inductiondetypepointeureventuellementdereferencee. variabled'inductiondetypepointeurouuneconstante. entieresetdeconstantessymboliques;etlesaccesauxarbresdoiventdereferencerune globales;lesindicesdetableauxdoivent^etredesfonctionsanesdesvariablesd'induction valeurdelavariableial'instancewestdeniecommelavaleurdeiimmediatementapres executiondel'instancewdel'instruction.cettevaleurestnotee[i](w). andedecrirelesconitseventuels.soituneinstructionetwuneinstancede.la Prealablemental'analysededependances,nousdevonscalculerlesfonctionsd'acces Pourtant,gr^aceauxrestrictionsquenousavonsimposeesaumodeledeprogramme,les Engeneral,lavaleurd'unevariableenunmotdecontr^oledonnedependdel'execution.
38 variablesd'inductionsontcompletementdetermineesparlesmotsdecontr^ole.onmontre IV.ANALYSEPARINSTANCEPOURPROGRAMMESRECURSIFS 37 quepourdeuxexecutionsdierenteseete0,lesvaleursdedeuxvariablesd'induction sontidentiquessurenunmotdecontr^oledonne.lesfonctionsd'accespourdierentes recurrentes: executionsconcidentdonc,etnousconsidereronsdoncparlasuiteunefonctiond'acces findependantedel'execution. Lemme2Onconsiderelemonode(Mdata;)quiabstraitlastructurededonneesconsideree,uneinstruction,etunevariabled'inductioni.L'eetdel'instructionsur lavaleurdeiestdecritparl'unedesequationssuivantes: Leresultatsuivantmontrequelesvariablesd'inductionsontdecritespardesequations ouinducestl'ensembledesvariablesd'inductionduprogramme,ycomprisi. oubien92mdata;j2induc: oualors92mdata: 8u2Lctrl:[i](u)= 8u2Lctrl:[i](u)=[j](u) inductivesietk,seulesutilespourl'analysededependances. LeresultatsurlaprocedureQueensestlesuivant.Onnes'interessequ'auxvariables Del'appelprincipalF:[Arg(Queens;2)](F)=0 Del'iterationdeboucleb:8ub2Lctrl:[j](ub)=[j](u)+1 Del'entreedeboucleB:8uB2Lctrl:[j](uB)=0 Del'appelrecursifQ:8uQ2Lctrl:[Arg(Queens;2)](uQ)=[k](u)+1 DelaprocedureP:8uP2Lctrl:[k](uP)=[Arg(Queens;2)](u) Arg(proc;num)representelenumeargumenteectifd'uneprocedureproc,ettoutes l'evolutiondesvariablesd'inductiondansunprogramme.combineavecleresultatsuivant, cetalgorithmepermetdeconstruireautomatiquementlafonctiond'acces. lesautresinstructionslaissentlesvariablesinchangees. Onaconcuunalgorithmepourconstruireautomatiquementuntelsystemedecrivant Theoreme9Lafonctiond'accesf quiassociechaqueaccespossibledansaala cellulememoirequ'illitouecrit estunefonctionrationnelledectrldansmdata. (usjf(us;a[k])) (urjf(ur;a[j])) LeresultatpourleprogrammeQueensestlesuivant: Onaappliquelam^emetechniqueauprogrammeBST: =(FPIAAj0) (JQPIAAj0)+(aAj0)(BBj0)(bBj1)(rj0) =(FPIAAj0) (JQPIAAj1)+(aAj0)(Jsj0) (ujf(u;p->l->value)) (ujf(u;p->value)) 82fI2;a;bg: 82fI2;b;cg: =(FPj") (I1LPjl)+(J1RPjr)(I1I2j") (ujf(u;p->value)) 82fJ2;d;eg: =(FPj") (I1LPjl)+(J1RPjr)(I1I2jl) (ujf(u;p->r->value)) 82fJ2;e;fg: =(FPj") (I1LPjl)+(J1RPjr)(J1J2j") =(FPj") (I1LPjl)+(J1RPjr)(J1J2jr)
39 38 IV.3 Analysededependancesetdedenitionsvisibles PRESENTATIONENFRANCAIS engeneral,maisonpeutproterdufaitquelafonctiond'accesfnedependpasde entrelesaccesconictuelsalamemoire.nousnepouvonspasespererunresultatexact l'execution.larelationdeconitapprocheequenouscalculonsestlasuivante: Al'aidedesfonctionsd'acces,notrepremierobjectifconsisteacalculerlarelation def 1etdefestsoitunetransductionrationnellesoitunetransductionaplusieurs D'apresletheoremedeElgotetMezei(sectionIII.2)etletheoreme8,lacomposition 8u;v2Lctrl:uvdef ()v2f 1(f(v)): peutserameneraunseulcompteurparuneapproximationconservatrice. [Deu94,Ste96],etlavacuited'unerelationrationnelleoualgebriqueestdecidable. compteurs.lenombredecompteurscorrespondaladimensiondutableauaccede,eton relationauxcouplesd'accescomportantaumoinsuneecriture,puisonintersecteavec Pouretablirletransducteurdecrivantlesdependances,ondoitd'abordrestreindrela Onremarquequetesterlavacuitedeestequivalental'analysed'aliasentrepointeurs teurauncompteurdanslecasdestableaux,etparuntransducteurrationneldanslecas desarbres.deplus,gr^acealaproposition1,l'intersectionavecl'ordrelexicographique l'ordrelexicographique.enutilisantlestechniquesdessectionsiii.3,iii.4etiii.5,onpeut n'estpasapproximativedanslecasdestableaux. toujourscalculeruneapproximationconservatrice.celle-ciestrealiseeparuntransduc- etapederestrictiondeauxseulesdependancesdeot,ondoitutiliserdesproprietes lesdependances,onaurabeaucoupdemalaobtenirunresultatprecis.passeelapremiere additionnellessurleotdesdonnees.latechniqueprincipalequenousutilisonsestfondee Sil'onchercheacalculerlesdenitionsvisiblesapartirdel'informationapprocheesur Denition8(anc^etre)Ondenitunco:unsous-ensembledectrlconstituedetoutes suruneproprietestructurelledesprogrammes: l'executionestinconditionnelle. deboucles,etdetouslesappelsdeprocedure(nongardes),c.-a-d.lesblocsdont lesetiquettesdeblocsquinesontpasdesinstructionsconditionnellesoudescorps uvsestappeleanc^etredewr. contr^olewr2lctrl(uneinstanceder).siv2uncoesttelqueuvs2lctrl,alors Soientretsdeuxinstructionsdansctrl,etsoituunprexestrictd'unmotde maispaslescarresgrisadjacents.lesanc^etresontlesdeuxproprietessuivantes: gure9.bpage35:lecarrenoirfpiaaaaaajsestunanc^etredefpiaaaaaajqpiaabbr, Cettedenitionsecomprendaisementsurunarbredecontr^olecommeceluidela 2.l'executiondeuimpliquecelledeuvscarv2unco. 1.l'executiondewrimpliquecelledeuquiestsurlechemindelaracineaunudwr; ceresultatal'analysededenitionsvisibles,oncommenceparidentierlesinstances d'eliminationdetransitionssurletransducteurdesdependancesdeot.onobtientun dontl'executionestgarantieparlaproprietedesanc^etres,puisonappliquedesregles Ainsi,siuneinstances'execute,toussesanc^etreslefontegalement.Pourappliquer transducteurquirealiseuneapproximationdesdenitionsvisibles. lativementtechnique,nousenresteronsladansceresume. L'integrationdecesideesdansl'algorithmed'analysededenitionsvisiblesetantre-
40 IV.4 IV.ANALYSEPARINSTANCEPOURPROGRAMMESRECURSIFS Lesresultatsdel'analyse 39 programmebstestuntransducteurrationneldecritparlagure Revenonstoutd'abordsurlecasdesstructuresd'arbres.Lafonctiond'accespourle LPjl I1 I1j" PFPj" J1j" RPjr I2pI2pj" J1 aaj"bpj" I2I2j" bp->ljlcjl I2p->ljlI2p->l c J2pJ2pj" ddj"epj" J2j" J2 J2p->rjr ep->rjrfjrj2p->r...figure11.transducteurrationnelpourlafonctiond'accesfduprogrammebst... f synchroneagaucheestnecessaire.leresultatpourbstestdecritparlagure12. Lorsqueleresultatestuntransducteursynchroneagauche,onpeutcalculerlesdependancessansapproximation,sinonuneapproximationdeal'aided'untransducteur Letransducteurduconitrealisantesttoujoursrationneldanslecasdesarbres.... LPjLP 2 I1jI1 1FPjFPJ1jJ1 RPjRP 8 3I2pjI2bp 6 4I2jI2I2p->ljI2c 7 5 9J2pjJ2epJ2jJ J2p->rjJ2f ajbp bp->ljc djep ep->rjf Figure12.TransducteurrationnelpourlarelationdedependanceduprogrammeBST instructionsd'unm^emebloci1ouj1.nousverronsqueceresultatpermetdeparalleliser leprogramme. Onretrouvesurceresultatlefaitquelesdependancessesituententrelesinstancesdes
41 40Etudionsapresentlecasdestableaux.Lafonctiond'accespourleprogrammeQueens PRESENTATIONENFRANCAIS estdecriteparuntransducteurrationneldectrldansmdata=z,donnesurlagure PFPj0 aaj0b A IAAj0 rj0bbj0jj0 QPj0 P0FPj0 bbj1 r J aaj0 A0IAAj0 Jj0 QPj1 J0 s0 sj0... Figure13.Transducteurrationnelpourlafonctiond'accesfduprogrammeQueens esttoujoursexact.leresultatpourqueensestdonneparlagure14. resynchronisationautransducteurrationnelsous-jacent(quiestreconnaissable),lecalcul relationdeconit.pourobtenirlarelationdedependance,onappliquel'algorithmede Onutiliseletheoreme8pourcalculeruntransducteurauncompteurrealisantla... "jbb; 1 24 "jbb "jaa "jr "jiaa13 "jj5 JjaA "jq aajaa 12 IAAj" 68 QPj";+1 "j" 7 13 JjJQPjQP;+1 IAAjIAA Jj" 9sj";= "jr;= FPjFP!0 aaj" 17 "jiaa sjqp "jbb 16 "jqp "jbb; 1 "jaa "jj 18...Figure14.Transducteurauncompteurpourlesdependancesdeot... demontrequeseulsdesanc^etresd'uneinstancederpeuvent^etredesdenitionsvisibles. Cetteproprietetresfortepermetd'eliminertouteslestransitionsquinemenentpasa mationssupplementairessurlesinstructionsconditionnellesduprogrammequeenson Onpeutdesormaiseectuerl'analysededenitionvisibles:enutilisantdesinfor-
42 unanc^etredansletransducteurdesdependances.leresultatestdonneparlagure15. IV.ANALYSEPARINSTANCEPOURPROGRAMMESRECURSIFS 41 calculeepourchaqueaccesenlecture. Onpeutmontrerfacilementqueleresultatestexact:uneuniquedenitionvisibleest... 1 JQPIAAjJQPIAA;+1!0 FPIAAjFPIAAaAjaA 2 JsjJQPIAA "jjqpiaa 3 "jbb; 1 "jaa "jbb 4 "jr;=0 5...Figure15.Transducteurauncompteurpour... l'aidedetransformationsprealables.desurcro^t,denombreusesrestrictionssemblent IV.5 Parmilesrestrictionsdumodeledeprogramme,certainespeuvent^etreelimineesa Comparaisonavecd'autresanalyses insertionsetsuppressionsdanslesarbresnesontautoriseesqu'auniveaudesfeuilles. dansnotreformalisme,etnousnevoyonspasdemethodegeneralepours'enpasser:les pouvoir^etreretireesdansdesversionsfuturesdel'analyse,al'aided'approximationadequates.ilsubsisteneanmoinsunerestrictiontresimportantequiestfermementenracinee JM82,Har89,Deu94]oud'autresformalismesd'analysedeotdedonnees[LRZ93,BE95, mentdesresultatssimilaires,qu'ellessoientfondeessurl'interpretationabstraite[cou81, HHN94,KSV96].Uneetudeinteressantedesanalysesstatiquesutilesenparallelisationest Lesanalysesstatiquesdedependanceetdedenitionvisiblesobtiennentgenerale- netravailauniveaudesinstances.aucunen'atteintlaprecisionnecessairepouridentier proposeedans[rr99].ilestaisedecomparernotretechniqueaveccesanalyses:aucune quelleinstancedequelleinstructionestenconit,endependance,ouestunedenition matiered'applicationsalaparallelisation,voirsectionv.5. visiblepossible.cesanalysessontcependantutilespourleveruncertainnombrederestrictionsdenotremodeledeprogrammes,etpourcalculerdesproprietesutilesal'analyse dedenitionsvisiblesparinstances.ilestplusinteressantdecomparercesanalysesen grammes,leresultatgeneraln'estpassurprenant:lesresultatsdelafadasontbien aveclafada[bcf97,bar98].surl'intersectioncommunedeleursmodelesdepro- plusprecis.eneet,nousn'utilisonslesinformationssurlesinstructionsconditionnelles Comparonsapresentaveclesanalysesparinstancepournidsdeboucles,parexemple seulcompteurpeut^etredecrit),etdesoperationsfondamentalescommel'intersection danslecasdetableauxaplusieursdimensions,lestransducteursrationnelsetalgebriques n'ontpasunpouvoird'expressionassezelevepourmanipulerdesparametresentiers(un qu'atraversdesanalysesexternes,desapproximationssupplementairessontnecessaires necessitentparfoisdesapproximations.onpeuttoutdem^emenoterdespointspositifs: l'exactitudeduresultatpeut^etredecideeentempspolyn^omialsurlestransducteursrationnels;lavacuiteesttoujoursdecidable,cequipermetunedetectionautomatiquedes variablesnoninitialisees;danslecasdesarbres,lestestsdedependances'eectuentsur deslangagesrationnelsdemotsdecontr^ole,cequiesttresutilepourlaparallelisation;
43 42 enn,danslecasdestableaux,lestestsdedependancesontequivalentsal'intersection PRESENTATIONENFRANCAIS d'unlangagerationnelavecunlangagealgebrique. bouclesanes.lestechniqueslespluscourantessontlamiseenassignationunique VLesrecherchessurl'expansiondelamemoireportentprincipalementsurlesnidsde Expansionetparallelisation optimisationspourlagestionecacedelamemoire[lf98,cfh95,cdrv97,qr99]. Lorsqueleotdecontr^olen'estpasprevisiblealacompilationoulorsquelesindexde [Fea91,GC95,Col98],laprivatisation[MAL93,TP93,Cre96,Li92]etdenombreuses tableauxnesontpasanes,leproblemedelarestaurationduotdesdonneesdevient capital,etlesconvergencesd'inter^etavecleformalismessa(staticsingle-assignment) [CFR+91]sonttresnettes.Enpartantd'exemplessimples,nousetudionslesproblemes speciquesauxnidsdebouclesnonanes,etproposonsdesalgorithmesdemiseenassignationunique.denouvellestechniquesd'expansionetd'optimisationdel'occupationen memoiresontensuiteproposeespourlaparallelisationautomatiquedecodesirreguliers. rentsdeceuxdesnidsdeboucles,etlesmethodesdeparallelisationexistantessefondent generalementsurdestestsdedependanceauniveaudesinstructions,alorsquenotreanalysedecritlarelationdedependanceauniveaudesinstances!nousmontronsquecette Lesprincipesducalculparalleleenpresencedeproceduresrecursivessonttresdie- informationtresprecisepermetd'ameliorernotablementlestechniquesclassiquesdeparallelisation.nousetudionsaussilapossibilited'expanserlamemoiredanslesprogrammes recursifs,etcetteetudeseterminepardesresultatsexperimentaux. V.1 desmethodesd'expansionlesplusclassiques.ellecorrespondaucasextr^emeouchaque Lamiseenassignationuniqueousingle-assignmentformconversion(SA)estl'une Motivationsetcompromis desrenommagesdevariables. lamiseenassignationuniquestatique(ssa)[cfr+91,ks98],oul'expansionselimitea cellulememoireestecriteauplusunefoisaucoursdel'execution.elledieredoncde assignationaunenouvellestructuredexpdontleselementssontdum^emetypequeceux ded,etsontenbijectionavecl'ensemblewdetouslesaccesenecriturepossiblesau coursdel'execution.dansunedeuxiemeetape,lesreferencesenlecturedoivent^etremises L'ideeconsistearemplacerchaqueassignationd'unestructurededonneesDparune ajourenconsequence:c'estcequel'onappellelarestaurationduotdesdonnees.on associeae(h{;refi).puisquel'onnedisposequed'uneapproximationdesdenitions visibles,cettetechniquen'estapplicablequelorsque(h{;refi)estunsingleton.sice lareferenceadenlectureh{;refidoit^etreremplaceeparunaccesal'elementdedexp utilisepourcelalesdenitionsvisiblesparinstances:pouruneexecutiondonneee2e, (h{;refi)desdenitionsvisiblespossibles. Cecodeestgeneralementrepresenteparunefonction,dontl'argumentestl'ensemble n'estpaslecas,ondoitgenereruncodederestaurationdynamiqueduotdesdonnees. dansleprogrammed'origineetl'identitedeladerniereinstancequiaecritunevaleur structurededonneessupplementaireenbijectionavecdexp:cettestructureestnoteedexp. OndoitmemoriserdeuxinformationsdansDexp:l'adressedelacellulememoireecrite Pourgenererlecodederestaurationdynamiqueassocieauxfonctions,onutiliseune danscettecellule.commeleprogrammeestenassignationunique,l'instanceestdeja
44 V.EXPANSIONETPARALLELISATION decriteparl'elementdedexpluim^eme:dexpdoitdonccontenirdesadressesdecellules 43 memoire.l'utilisationdecettestructureestlasuivante:oninitialisedexpanull;puis achaqueassignationdedexponecritdansdexpl'adressedelacellulememoireecrite l'adressedelacellulememoireluedansleprogrammed'origine. maximum selonl'ordresequentiel detousles{2settelsquedexp[{]soitegala dansleprogrammed'origine;ennunereference(set)estimplementeeparuncalculde desdonnees[col98]:desresultatsprecispermettentnonseulementdereduirelenombre defonctions,maisegalementdesimplierlesargumentsdecelles-ci,etdoncd'optimiser lescalculsdemaximumaucoursdel'execution.onremarqueraegalementquelecalcul L'analysededenitionsvisiblesparinstancesestalabasedelarestaurationduot deal'executionpeutluim^emeserevelerco^uteux,m^emeenl'absencedefonction. recursifs,nousverronsqueleproblemeducalculdeestplusdelicat. codegenere.l'exempledelagure16illustrecesremarques.danslecasdesprogrammes Danslecasdesnidsdeboucles,lesurco^utn'estpourtantd^uqu'al'implementationdu quastassociea;destechniquesdeparcoursdepolyedre[ai91]permettentd'optimiserle TA[0]=0;... doublea[n]; Sfor(i=0;i<N;i++) for(j=0;j<n;j++){ TAT=0; doublea[n],at,as[n,n],ar[n,n]; R } A[i+j]=; for(i=0;i<n;i++) Figure16.a.Programmed'origine A[i]=A[i+j-1]; SR for(j=0;j<n;j++){ AS[i,j]=; AR[i,j]=(fhTig[fhS;i0;j0i: Figure16.b.SAsansanalysededenitionsvisibles } (i0;j0)<lex(i;j)g) TAT=0; doublea[n],at; doubleas[n,n],ar[n,n]; doublea[n],at; Sfor(i=0;i<N;i++) doubleas[n,n],ar[n,n]; R for(j=0;j<n;j++){ AR[i,j]=if(j==0) AS[i,j]= ; elseas[i,j-1] if(i==0)at for(i=0;i<n;i++){ AT=0; AR[1,1]=AT; AS[1,1]=; } elseas[i-1,j] AS[i,1]=; AR[i,1]=AS[i-1,1]; Figure16.c.SAavecuneanalyseprecisedes } for(j=0;j<n;j++){ AR[i,j]=AS[i,j-1]; AS[i,j]=; denitionsvisibles chage)delaboucle Figure16.d.Analysepreciseet(eplu- }... Figure16.Interactionsentrel'analysededenitionsvisiblesetlesurco^utal'execution L'implementationreelledecestechniquesdependdesstructuresdecontr^oleetde
45 44 donnees.danslecasdesbouclesetdestableaux,nousproposonsdesalgorithmesde PRESENTATIONENFRANCAIS etudieronsdanslasectionv.5. miseenassignationuniquequietendentlesresultatsexistantsadesnidsquelconques.la miseenassignationuniquedeprogrammesrecursifsestundomainenouveauquenous l'aided'unenouvelleinformationsurleotdesdonneesappeleedenitionsvisibled'une reduitlesensemblesdedenitionsvisiblespossibles(lesargumentsdesfonctions)a.lapremiereappliquedesoptimisationssimplessurlesstructuresdexp;ladeuxieme Nousavonsegalementdeveloppetroistechniquespouroptimiserlecalculdesfonctions cellulememoire;etlatroisiemeeliminelesredondancesdanslecalculdumaximum eneectuantlescalculsaufuretamesure.cettedernieretechniquenegenerepasa proprementparlerunprogrammeenassignationunique,cequipeutparfoisnuireason ))quinenuitpasalaparallelisation. methoded'eliminationdesredondances(appeleeaussi(placementoptimisedesfonctions utilisationenparallelisationautomatique.avecunevisiondierentedel'expansion(pas necessairementenassignationunique),lasectionv.4proposeuneversionamelioreedela etdoncd'eliminerlemaximumdedependances sansrecouriradesfonctionspour V.2Lebutdel'expansionstatiquemaximaleestd'expanserlamemoirelepluspossible Expansionstatiquemaximale restaurerleotdesdonnees. possiblesd'unelectureu,etsupposonsqu'ellesaectentlam^emecellulememoire.siv etwecriventdansdeuxcellulesmemoiredierentesapresexpansion,unefonctionsera necessairepourchoisirlaquelledesdeuxecrituresdenitlavaleurlueparu.onintroduit Consideronsdeuxecrituresvetwappartenantal'ensembledesdenitionsvisibles m^emelecture: donclarelationrentrelesecrituresquisontdesdenitionsvisiblespossiblespourla memoiredansleprogrammed'origine,ellesdoiventfairedem^emedansleprogramme expanse.puisque(ecriredanslam^emecellulememoire)estunerelationd'equivalence, Lorsquedeuxdenitionsvisiblespossiblespourlam^emelectureecriventlam^emecellule 8v;w2W:vRw()9u2R:vu^wu: ecriture,onmontreleresultatsuivant: Proposition3Unefonctiond'accesfexp d'accesexpanseesfexp onconsidereenfaitlacl^oturetransitiverdelarelationr.enselimitantadesfonctions e delaforme(fe;),ouestunecertainefonctionsurlesaccesen pourtouteexecutionessi 8v;w2We;fe(v)=fe(w):vRw()(v)=(w): e =(fe;)estuneexpansionstatiquemaximale nousproposonsestlimiteauxnidsdebouclesquelconquessurtableaux.uncertainnombre depointstechniques notammentlacl^oturetransitivederelationsanes requierent valenced'unecertainerelation.leformalismeestdonctresgeneral,maisl'algorithmeque Apartirdeceresultat,onpeutcalculerunefonctionenenumerantlesclassesd'equi- uneattentionparticuliere,maisceux-cinesontpastraitesdansceresumeenfrancais. l'expansionstatique,ils'agitdoncd'uncompromisentresurco^utal'executionetparallelismeextrait.nouspresentonsegalementtroisexemples,surlesquelsnousappliquons semi-automatiquement(avecomega[pug92])l'algorithmed'expansion.toutefois,unseul Danslecasgeneral,lamiseenassignationuniqueexposeplusdeparallelismeque exempleestetudiedansceresume,voirsectionv.4.
46 V.EXPANSIONETPARALLELISATION V.3 Optimisationdel'occupationenmemoire 45 probablementapartirdelarelationapprocheedesdenitionsvisibles.ilestinteressantdenoterquecetordreparallelepeut^etreobtenuparn'importequelletechnique ordonnancementoupartitionnementparexemple tantqueleresultatpeut^etredecrit programmeexpansesanspertedeparallelisme.noussupposonsainsiqu'unordred'executionparallele<paradejaetedeterminepourleprogrammed'origine(<seq;fe) Nouspresentonsmaintenantunetechniquepourreduirel'occupationenmemoired'un parunerelationane. denitionsvisibles.onobtientalorsunprogrammeexpansequirequiert(generalement) moinsdememoirequelaformeenassignationunique,maisquiestcompatibleavec (data-ow),c'estadirel'ordre(leplusparallelepossible)d'apreslarelationde Moyennantuncalculdecl^oturetransitive,ilestm^emepossibledepartirdel'ordre n'importequelleexecutionparallelelegale. duprogrammed'origine.enutilisantlanotation expanseesfexp expansionscorrectesvisavisdecetordreparallele,c.-a-d.quellessontlesfonctionsd'acces Notrepremieret^achepourformaliserleproblemeconsisteadeterminerquellessontles 8v;w2W:v./wdef e quigarantissentquel'ordred'executionparallelepreservelasemantique 9u2R:vu^wparv^uparw^(u<seqw_w<seqv_v6w) _ 9u2R:wu^vparw^uparv^(u<seqv_v<seqw_w6v); () Theoreme10(correctiondesfonctionsd'acces)Silaconditionsuivanteestrem- nousavonsmontreleresultatsuivant: rallelepreservelasemantiqueduprogrammed'origine. plie,l'expansionestcorrecte,c'estadirequ'ellegarantitquel'ordred'executionpa- ecriredansdescellulesmemoiresdistincteslorsque:ws'executeentrevetudansle Intuitivement,unedenitionvisiblevd'unelectureuetuneautreecriturewdoivent 8e2E;8v;w2We:v./w=)fexp e(v)6=fexp e(w): etdelafonctiond'accesduprogrammed'origine. criteredecorrectionestoptimal,pouruneapproximationdonneedesdenitionsvisibles cellulememoirequevdansleprogrammed'origine.deplus,nousavonsmontrequece programmeparallele,etsoitwnes'executepasentrevetusoitwassigneuneautre nonbornedecritparunerelationane.lamethodeestlam^emequedanslecasdesnids debouclesanes,elleestdetailleeenfrancaisdanslathesedelefebvre[lef98]. Al'aidedececritere,lagenerationducodeexpanserequiertlacolorationd'ungraphe V.4 precedentes,etnousproposonsuncadregeneralpouroptimisersimultanementlesurco^ut Nousmontronsapresentqu'ilestpossibledecombinerlesdeuxtechniquesd'expansion Expansionoptimiseesouscontrainte doncdedonnerunexempleillustrantl'expansioncontrainte quigeneralisel'expansion statique combineeavecl'optimisationdel'occupationenmemoire. lesalgorithmessonttroptechniquespourfairepartiedeceresume,nousnouscontenterons del'expansionetleparallelismeextrait:l'expansioncontrainteoptimisee.leformalismeet
47 46... PRESENTATIONENFRANCAIS doublex; for(i=1;i<=m;i++){ for(j=1;j<=m;j++) doublext[m+1,m+1],xs[m+1,m+1,n+1]; TS if(p(i;j)){ x=0; R =x; } for(k=1;k<=n;k++) } x=x; Tparallelfor(i=1;i<=M;i++){ parallelfor(j=1;j<=m;j++) S if(p(i;j)){ xt[i,j]=0; } for(k=1;k<=n;k++) xs[i,j,k]=if(k==1)xt[i,j]; elsexs[i,j,k-1]; Figure17.a.Programmed'origineFigure17.b.Miseenassignationunique R} =(fhs;i;1;ni;:::;hs;i;m;nig);...figure17.exempledeparallelisation... doncleprogrammeenassignationunique.leresultatdel'analysededenitionsvisibles boucleexterne.lesdependancessurxinterdisenttouteexecutionparallele,ontransforme positifetquelepredicatp(i;j)estvraiaumoinsunefoispourchaqueiterationdela Nousetudionslepseudo-codedelagure17.a.NoussupposonsqueNeststrictement estexactpourlesinstancesdes,maispaspourcellesder:unefonctionestnecessaire. Lesdeuxbouclesexternesdeviennentalorsparalleles,commelemontrelagure17.b. dereduirel'occupationenmemoire.l'applicationdel'algorithmedelasectionv.3montre l'executionsequentielle(sursgiorigin2000avec32processeurs).ilestdoncnecessaire observequel'executionenparalleledeceprogrammeestenvironcinqfoispluslenteque Enraisondecettefonctionetdel'utilisationd'untableauatroisdimensions,on magedexenxsetxt.onobtientlecodedelagure18.a.onaimplementelafonction quel'expansionselonlabouclelaplusinternen'estpasnecessaire,pasplusquelerenom- cacheunesynchronisation.lesperformancessontdonccorrectespourunpetitnombre avecunetechniqueoptimiseedecalculalavolee(voirsectionv.1)etlecalculdumax delafonction,eninterdisantl'expansionselonlaboucleintermediaire,voirgure18.b; seulelaboucleexterneresteparallele.leprogrammeparallelesurunprocesseurestenvirondeuxfoispluslentqueleprogrammesequentiel(probablementenraisondesacces L'applicationdel'algorithmed'expansionstatiquemaximalepermetdesedebarrasser deprocesseurs,maissedegradenttresrapidementaudeladequatre. autableauadeuxdimensions),maisl'accelerationestexcellente.onobservequelavariablexaeteanouveauexpanseeselonlaboucleinterne,bienquecelan'apporteaucun parallelismesupplementaire:ilestdoncnecessairedecombinerlesdeuxtechniquesd'expansion.leresultatesttresprochedel'expansionstatiquemaximaleavecunedimension demoinspourletableaux:x[i]aulieudex[i,].bienentendu,lesperformances V.5 sontexcellentes:l'accelerationestde31;5sur32processeurs(m=64etn=2048). avoirlejour,gr^aceauxenvironnementsetauxoutils commecilk[mf98] facilitant Destechniquesdeparallelisationautomatiquepourprogrammesrecursifscommencent Parallelisationdeprogrammesrecursifs sonsunetechniquedemiseenassignationuniqueetunetechniquedeprivatisationpour l'implementationecacedeprogrammesaparallelismedecontr^ole[rr99].nouspropo-
48 V.EXPANSIONETPARALLELISATION... doublex[m+1,m+1]; 47 parallelfor(i=1;i<=m;i++){ parallelfor(j=1;j<=m;j++) S if(p(i;j)){ for(k=1;k<=n;k++) x[i,j]=0; doublex[m+1,n+1]; R=x[i,@x[i]]; }@x[i]=max(@x[i],j); x[i,j]=x[i,j]; Tparallelfor(i=1;i<=M;i++){ for(j=1;j<=m;j++) if(p(i;j)){ x[i,0]=0; pationenmemoire Figure18.a.Optimisationdel'occu- } SR } for(k=1;k<=n;k++) } =x[i,n]; x[i,k]=x[i,k-1];...figure18.deuxparallelisationsdierentes... Figure18.b.Expansionstatiquemaximale programmerecursifs,puisnouspresentonsdeuxmethodesdegenerationdecodeparallele. Expansiondeprogrammesrecursifs L'allocationdynamiqueetl'accesacesstructuresestdoncplusdelicatquedanslecasdes nidsdeboucles.l'ideegeneraleestdeconstruirechaquestructureexpanseedexp(ala ralementunestructured'arbre:seselementsontenbijectionaveclesmotsdecontr^ole. Dansunprogrammerecursifenassignationunique,lesstructuresexpanseesontgene- necessairepourlamiseajourdesreferencesenlecture:ondoittoutd'abordcalculerles lescellulesmemoireassocieesdansdexp.m^emeenl'absencedefonction,larestauration volee),enpropageantunpointeursurlenudcourant.l'accesdirectadexpesttoutefois denitionsvisiblespossiblesal'aidedutransducteurfourniparl'analyse,puisretrouver tionpartiellederdansw.lorsquecettefonctionpeut^etrecalculee(alavolee),il duotdesdonneesrisquedoncd'^etretresco^uteuse. panse:ilsutd'implementerlecalculpasapasdutransducteur.c'estnotammentlecas estpossibledegenereruncodeecacepourlesreferencesenlectureduprogrammeex- Silesdenitionsvisiblessontconnuesexactement,peut^etrevuecommeunefonc- avonstoutefoisproposeunalgorithmedemiseenassignationuniquepourprogrammes manipuleunestructured'arbre.enpresencedetableaux,ilestplusdiciledesavoirsi letransducteurauncompteurdesdenitionsvisiblesestcalculable(alavolee).nous pourlestransducteurssous-sequentiels(voirsectioniii.2),lorsqueleprogrammerecursif atransformerlesstructuresdedonneesglobalesenvariableslocales.danslecasgeneral, unecopiedesdonneesdoit^etreeectueelorsdechaqueappeletdechaqueretourd'une recursif,incluantlecalculalavoleedesdenitionsvisibleslorsquecelaestpossible. procedure.cecipeutserevelerco^uteuxlorsdelacopiedesstructureslocalesdansles Nousavonsetendulanotiondeprivatisationauxprogrammesrecursifs:elleconsiste tionsinevitablesencasd'executionparallele.toutefois,lorsquelesdenitionsvisiblessont obligatoirementdesanc^etres,seulelapremierephasedecopie(lecopy-in)estnecessaire; structuresdelaprocedureappelante(lecopy-out),notammentacausedessynchronisa-
49 48 c'estlecasduprogrammequeens,delaplupartdesalgorithmesdetri,etplusgeneralementdesschemasd'executiondutypediviserpourregnerouprogrammationdynamique. Nousproposonsdoncunalgorithmedeprivatisationpourprogrammerecursifs,oules Generationdecodeparallele PRESENTATIONENFRANCAIS fonctionssontremplaceespardescopiesdestructuresdedonnees. PvoidQueens(intA[n],intn,intk){... inta[n]; intb[n]; B=b A=a I if(k<n){ memcpy(b,a,k*sizeof(int)); r for(inti=0;i<n;i++){ J for(intj=0;j<k;j++){ s if(){ =B[j]; Q B[k]=; }}} spawnqueens(b,n,k+1); F}intmain(){ Queens(A,n,0);...Figure19.PrivatisationetparallelisationduprogrammeQueens... gorithmedeparallelisationauniveaudesinstructionsquipermetd'executercertaines gebriquespermettentderealiserdestestsdedependanceecaces.onendeduitunal- instructionsdemaniereasynchroneetquiintroduitdessynchronisationslorsquelesde- Nousmontronsquelesproprietesdedecidabilitedestransducteursrationnelsetal- tableaux,etdansunemoindremesureal'ordonnanceurdecilk[mf98]. SGIOrigin2000pourn=13.Leralentissementsurunprocesseurestd^uauxcopiesde pendancesl'exigent.cetalgorithmeestappliqueauprogrammebst,ainsiqu'aupro- grammequeensapresprivatisation,voirgure19.l'experimentationaetefaitesurune resultatsquelestechniquesexistantes,lorsqueladecouvertedeparallelismenecessiteune informationauniveaudesinstances.enn,nousetudionslaparallelisationparinstances deprogrammesrecursifs,oulessynchronisationssontgardeesparlesconditionsprecises Nousmontronsegalementquenotrealgorithmedeparallelisationdonnedemeilleurs quenousproposonsexploitepleinementleresultatdel'analysededependancesparinstances,etlapossibilitedetesterecacementsiuncoupledemotsestreconnuparun transducteur.unexempleconcretpermetdevalidercettenouvelletechnique. surlemotdecontr^ole pourlesquellesunedependanceestpossible.l'algorithme Optimal 13-Queens 16 } Processors Speed-up (parallel / original) 32
50 VI.CONCLUSION Conclusion 49 discussionsurlesdeveloppementsavenir. Cettetheseseconclutparunerecapitulationdesprincipauxresultats,suivied'une troispremieresconcernentlaparallelisationautomatiqueetsontresumeesdansletableau VI.1 Noscontributionsserepartissentenquatrecategoriesfortementinterdependantes.Les Contributions suivant;laquatriemecategorieconcernelestransductionsrationnellesetalgebriques. Analysededependances [Bra88,Ban88] surtableaux Nidsaffines [BCF97,Bar98] Nidsgeneraux surtableaux surarbresettableaux [Fea98]1,sectionIV, Programmesrecursifs analysededefinitions[fea88a,fea91,pug92] visiblesparinstances parinstances[fea88a,fea91,pug92] [CBF95,BCF97,Bar98] [WP95,Won95] publiedans[cc98]2 Miseen [Fea88a,Fea91] [MAL93] [WP95,Won95] [Col98], publiedans[cc98]2 sectionv.5 sectioniv, assignationunique statiquemaximale Expansion sectionsv.2etv.4, sectionsv.1etv.4 Optimisationde [LF98,Lef98] publiedans[bcc98,coh99b,bcc00] sectionsv.3etv.4, problemeouvert problemeouvert l'occupationmemoire Parallelisation [SCFS98,CDRV97] [Fea92,CFH95] publiedans[cl99,coh99b] Apresent,passonsenrevuechaquecontribution. parinstances [DV97] [GC95,CBF95] [Col95b] sectionv.5 aeteutilisetoutaulongdecetravailpourformaliserlapresentationdenostechniques, pourlesinstancesd'instructionsetleselementsdestructuresdedonnees.cecadregeneral Structuresdecontr^oleetdedonnees:audeladumodelepolyedriqueDansla enparticulierdanslecasdesstructuresrecursives. sectionii,nousavonsdeniunmodeledeprogrammesetdesabstractionsmathematiques desvariablesd'inductionadapteeauxprogrammesrecursifsapermisdedecrirel'eetde plusprecisementdestransductionsrationnellesetalgebriques.unenouvelledenition seesdanslasectioniv.ellesutilisentunformalismedelatheoriedeslangagesformels, Denouvellesanalysesdedependancesetdededenitionsvisiblesontetepropo- avecd'autresanalysesconclutcetravail. chaqueinstanceal'aided'unetransductionrationnelleoualgebrique.unecomparaison tableaux uncasparticulierdenotremodele noussommesrestesdelesauxvecteurs 2.Pourlestableauxuniquement. 1.Ils'agitd'untestdedependancespourlesarbresuniquement. Enrevanche,lorsquenousavonsconcudesalgorithmespourlesnidsdebouclessur
51 50 d'iterationetnousavonsprotedelaquantited'algorithmespermettantlamanipulation PRESENTATIONENFRANCAIS derelationsanesdansl'arithmetiquedepresburger. etenduesauxprogrammesavecdesexpressionsconditionnelles,avecdesreferencescomplexesauxstructuresdedonnees parexempledesindexdetableauxnonanes oniqueancienne,maislesanalysesdedenitionsvisiblesparinstancessesontrecemment problemesl'applicationdel'expansiondelamemoirealaparallelisationestunetech- Expansiondelamemoire:denouvellestechniquespourresoudredenouveaux avecdesappelsrecursifs,etcelaposedenouvellesquestions.lapremiereestdegarantir quelesaccesenlecturedansleprogrammeexpansereferentlabonnecellulememoire;la modelesdeprogrammes. deuxiemequestionresidedansl'adequationdestechniquesd'expansionaveclesnouveaux techniquepourreduirelesurco^utdel'expansional'execution,etnousavonsetenduaux nidsdebouclessansrestrictionsunemethodedereductiondel'occupationenmemoire. lesnidsdeboucles(sansrestrictions)surtableaux.nousavonspresenteunenouvelle LesdeuxquestionssonttraiteesdanslessectionsV.1,V.2,V.3etV.4,danspour presentespourunearchitectureamemoirepartagee. larestaurationduotdesdonneesal'execution.quelquesresultatsexperimentauxsont Lacombinaisondesdeuxaeteetudieeetnousavonsconcudesalgorithmespouroptimiser totalementnouveau,etnousavonsdecouvertquel'abstractionmathematiquepourles denitionsvisibles lestransductionsrationnellesoualgebriques peuventengendrer dessurco^utsimportants.nousavonsneanmoinsdeveloppedesalgorithmesquiexpansent L'expansiondelamemoirepourprogrammesrecursifsestundomainederecherche aetemiseaprotpourparalleliserdesprogrammesrecursifs.nousavonspudemontrer Parallelisme:extensiondestechniquesclassiquesNotreanalysededependance desprogrammesrecursifsparticuliersavecunfaiblesurco^utal'execution. lesapplicationspratiquesdestransductionsrationnellesetalgebriques,enutilisantleurs programmesrecursifs:cettenouvelletechniqueestrenduepossibleparl'utilisationdes ilprotedel'informationpluspreciserecueillieparl'analyseetonobtientengeneral proprietesdecidables.notrepremieralgorithmeressembleauxmethodesexistantes,mais transductionsrationnellesetalgebriques.quelquesresultatsexperimentauxsontdecrits, demeilleursresultats.unautrealgorithmepermetlaparallelisationparinstancesde derniersresultatsdecetravailn'appartiennentpasaudomainedelacompilation.ilsse Theoriedeslangagesformels:quelquescontributionsetdesapplicationsLes encombinantexpansionetparallelisationsurunprogrammerecursifbienconnu. cetteclassen'estpasdecidableparmilestransductionsrationnelles,maisdestechniques gebrebooleeneetdenombreusesautresproprietesinteressantes.nousavonsmontreque avonsdeniunesous-classedestransductionsrationnellesquiadmetunestructured'al- trouventprincipalementdanslasectioniii.3ainsiquedanslessectionssuivantes.nous d'approximationconservatricespermettentdebenecierdecesproprietesdanslaclasse destransductionsrationnellestoutentiere.nousavonsegalementpresentequelquesnouveauxresultatssurlacompositiondetransductionsrationnellessurdesmonodesnon libres,avantd'etudierl'approximationdetransductionsalgebriques.
52 VI.CONCLUSION 51 VI.2 Perspectives Denombreusesquestionssesontposeestoutaulongdecettethese,etnosresultats suggerentplusderecherchesinteressantesqu'ilsneresolventdeproblemes.nouscommenconsparaborderlesquestionslieesauxprogrammesrecursifs,puisnousdiscutons destravauxfutursdanslemodelepolyedrique. Enpremierlieu,larecherched'uneabstractionmathematiquecapablededecriredes proprietesauniveaudesinstancesappara^tdenouveaucommeunenjeucapital.les transductionsrationnellesetalgebriquesontsouventdonnedebonsresultats,maisleur expressivitelimiteeaegalementrestreintleurchampd'application.c'estl'analysede denitionsvisiblesquienaleplussouert,ainsiquel'integrationdesexpressionsconditionnellesetdesbornesdebouclesdansl'analysededependances.danscesconditions, nousaurionsbesoindeplusd'uncompteurdanslestransducteurs,toutenconservantla possibilitedesavoirsiunensembleestvideetdedeciderd'autresproprietesinteressantes. NoussommesdoncfortementinteressesparlestravauxdeComonetJurski[CJ98]sur ladecisiondelavacuitedansunesous-classedeslangagesaplusieurscompteurs,etplus generalementnousvoudrionssuivredepluspreslesetudessurlavericationdesystemes fondeessurdesclassesrestreintesdemachinesdeminsky,commelesautomatestemporises.l'utilisationdeplusieurscompteurspermettraitenplusd'etendrel'unedesgrandes ideesdel'analyseouedeotdesdonnees[cbf95]:l'insertiondenouveauxparametres pourameliorerlaprecisionendecrivantlesproprietesdesexpressionsnonanes. Deplus,nouspensonsquelesproprietesdedecidabilitenesontpasforcementlepoint leplusimportantpourlechoixd'uneabstractionmathematique:debonnesapproximationssurlesresultatssontsouventsusantes.enparticulier,nousavonsdecouvert enetudiantlesrelationssynchronesagaucheetlesrelationsdeterministesqu'unesousclasseavecdebonnesproprietesdedecisionnepeutpas^etreutiliseedansnotrecadre generald'analysesansmethodeecaced'approximation.l'ameliorationdenosmethodes deresynchronisationetd'approximationdetransducteursrationnelsestdoncunenjeu important.nousesperonsaussiquececidemontrel'inter^etmutueldescooperationsentre theoriciensetchercheursencompilation. Audeladecesproblemesdeformalisme,uneautrevoiederechercheconsisteadiminuerautantquepossiblelesrestrictionsimposeesaumodeledeprogramme.Commeonl'a proposeprecedemment,lameilleuremethodeconsistearechercherunedegradationprogressivedesresultatsal'aidedetechniquesd'approximation.cetteideeaeteetudieedans uncontextesemblable[cbf95],etl'applicationauxprogrammesrecursifsprometdestravauxfutursinteressants.uneautreideeseraitdecalculerlesvariablesd'inductionapartir destracesd'execution(aulieudesmotsdecontr^ole) pourautoriserlesmodications dansn'importequelleinstruction puisdededuiredesinformationsapproximativessur lesmotsdecontr^ole;l'utilisationdetechniquesd'interpretationabstraite[cc77]serait probablementuneaideprecieusepourprouverlacorrectiondenosapproximations. Nousn'avonspastravaillesurleproblemedel'ordonnancementdesprogrammesrecursifs,carnousneconnaissonsaucunemethodepermettantd'assignerdesensembles d'instancesadesdatesd'execution.laconstructiond'untransducteurrationneldesdates auxinstancesestpeut^etreunebonneidee,maislagenerationdecodepourenumererles ensemblesd'instancesdevientplut^otdicile.maiscesraisonstechniquesnedoiventpas cacherquel'essentielduparallelismedanslesprogrammesrecursifspeutd'oresetdeja ^etreexploitepardestechniquesaparallelismedecontr^ole,etlanecessitederecouriraun modeled'executionaparallelismededonneesn'estpasevidente. Enplusdeleurincidencesurnotreetudedesprogrammesrecursifs,lestechniques
53 52 issuesdumodelepolyedriquerecouvrentunepartieimportantedecettethese.unobjectifmajeurtoutaulongdecestravauxaetedeconserverunecertainedistanceavecla PRESENTATIONENFRANCAIS ilasurtoutl'avantagedepresenternotreapprochedanstoutesageneralite.parmilesproblemestechniquesquidevraient^etreameliores,tantpourl'expansionstatiquemaximale pasfaciliterl'ecritured'algorithmesoptimisespr^etsal'emploidansuncompilateur,mais representationmathematiquedesrelationsanes.cepointdevueal'inconvenientdene otdesdonnees,maisnousavonstrespeud'experiencepratiquedelaparallelisationde nidsdebouclesavecunotdecontr^oleimprevisibleetdesindexdetableauxnonanes. etpourl'optimisationdel'occupationenmemoire,lesplusimportantssontlessuivants. CommeleformalismeSSA[CFR+91]estprincipalementutiliseentantquerepresentation Nousavonspresentedenombreuxalgorithmespourlarestaurationdynamiquedu intermediaire,lesfonctionssontrarementimplementeesenpratique.lagenerationd'un codederestaurationecaceestdoncunproblemeplut^otrecent. resultequ'uneexperimentationdegrandeampleurn'ajamaispu^etreconduite.pourappliquerdesanalysesetdestransformationsprecisessurdesprogrammesreels,unimportant travaild'optimisationresteaconduire.lesideesprincipalesseraientdepartitionnerle Aucunparalleliseurpournidsdebouclessansrestrictionsn'ajamaiseteecrit.Ilen regionsdetableaux[cre96]ouauxordonnancementshierarchiques[cw99]. code[ber93]etd'etendrenostechniquesauxgraphesdedependancehierarchiques,aux enmemoire,leplacementdescalculsetdescommunications...nousavonsvuquele problemed'optimisationestencorepluscomplexepourdesnidsdebouclesnonanes.le nombredeparametres:lesurco^utal'execution,l'extractionduparallelisme,l'occupation Uncompilateurparallelisantdoit^etrecapabledereglerautomatiquementungrand deparametresliesal'expansiondelamemoire,maisilnes'agitqued'unpremierpas. formalismed'expansioncontraintepermetd'optimisersimultanementuncertainnombre
54 53 Chapter1 Introduction factors:fastincreaseofprocessorfrequency,broaderbuswidths,increasednumberof Performanceincreaseincomputerarchitecturetechnologyisthecombinedresultofseveral becominglessandlessuniformandsimple:despitethehardwaresupportforcaches, withhighlatencies,andglobalincreaseofstoragecapacities.newimprovementsand functionalunits,increasednumberofprocessors,complexmemoryhierarchiestodeal performancebecomesmoreandmorecomplex.goodoptimizationsforsomeparticular superscalarexecutionandsharedmemorymultiprocessing,tuningagivenprogramfor architecturaldesignsareproposedeveryday.theresultisthatthemachinemodelis casecanleadtodisastrousresultswithadierentmachine.moreover,hardwaresupport totranslaterawcomputationpowerintosustainedperformance.therecentshiftof parallelismandcoarsegrainparallelismrequiresadditionalsupportfromthecompiler withdeepmemoryhierarchies,localmemories,outofcorecomputations,instructionlevel isgenerallynotsucientwhenthecomplexityofthesystembecomestoohigh:dealing microprocessortechnologyfromsuperscalarmodelstoexplicitinstructionlevelparallelism isoneofthemostconcretesignsofthistrend. andformostapplications,architecturesaretoodiversetodenepracticaleciencycriteriaandtodevelopspecicoptimizationsforaparticularmachine.onthesecondhand, Indeed,thewholeofcomputerarchitectureandcompilerindustryisnowfacingwhat thehighperformancecomputingcommunityhasdiscoveredforyears.ontheonehand, tomorrowinhislaptop. programsarewritteninsuchawaythattraditionaloptimizationandparallelizationtechniqueshavemanyproblemstofeedthehugecomputationmonstereverybodywillhavputers,aprogram oratleastthealgorithmitimplements mustcontainasignicant degreeofparallelism.eventhen,eithertheprogrammerand/orthecompilerhastoexposethisparallelismandapplythenecessaryoptimizationstoadaptittotheparticular Inordertoachievehighperformancesonmodernmicroprocessorsandparallelcom- tocopewiththefastobsolescenceofparallelmachines.thefollowingtwopossibilities areoeredtotheprogrammertomeettheserequirements. characteristicsofthetargetmachine.moreover,theprogramshouldbeportableinorder First,explicitlyparallellanguages.Mostoftheseareparallelextensionsofsequentiallanguages.ThisincludeswellknowndataparallellanguagessuchasHPF,and sharedmemoryarchitectures.someextensionsalsoappearundertheformoflibraries:pvmandmpiforinstance,orhigher-levelmulti-threadedenvironments suchasimlfromtheuniversityofillinois[ssp99]orcilkfromthemit[mf98]. recentmixeddataandcontrolparallelapproachessuchasopenmpextensionsfor
55 54 Theseapproachesmakestheprogrammingofhighperformanceparallelalgorithms CHAPTER1.INTRODUCTION possible.however,besidesparallelalgorithmics,theprogrammerisalsoincharge ofmoretechnicalandmachine-dependentoperations,suchasthedistributionof ducesportability.severaleortshavebeendoneinhpfsoastomakethecompiler synchronizations.thisrequiresadeepknowledgeofthetargetarchitectureandre- takecareofsomepartsofthisjob,butitseemsthattheprogrammerstillneedsto dataontheprocessorsdependingontheirmemorycapacities,communicationsand Second,automaticparallelizationofahighlevelsequentiallanguage.Theobviousadvantagesofthisapproacharetheportability,thesimplicityofprogramming parallelized(intheory).howeverthetaskallotedtothecompiler-parallelizerisoverwhelming.indeed,theprogramhasrsttobeanalyzedinordertounderstand at leastpartially whatisperformedandwheretheparallelismlies.thecompilerthen andthefactthatevenoldundocumentedsequentialcodesmaybeautomatically haveapreciseknowledgeofwhatthecompilerdoes. hastotakesomedecisionsabouthowtogenerateaparallelcodewhichtakesinto accountthespecicitiesofthetargetarchitecture.evenforshortprogramsanda simpliedmodelofparallelmachine,\optimality"inbothstepsisoutofreachfor TheusualsourcelanguagesforautomaticparallelizationisFortran77.Indeed, exists,andthedicultyoftenliesinchoosingthemoreappropriate. decidabilityreasons.asamatteroffact,awidepanelofparallelizationtechniques Thesestudiesarelessadvancedthanthehistoricalapproach,butalsomorerelated howeverdealwiththeparallelizationofcoroffunctionallanguagessuchaslisp. manyscienticapplicationshavebeenwrittenwithfortran,whichallowsonlyrelativelysimpledatastructures(scalarandarrays)andcontrolow.severalstudies withthepresentwork:theyhandleprogramswithgeneralcontrolanddatastructures.manyresearchprojectsalreadyexist,amongothers:parafrase-2andpolaris [BEF+96]fromtheUniversityofIllinois,PIPSfromEcoledesMines[IJT90],SUIF lelizingtools,suchascft,forge,foresysorkap. UniversityofVersailles;therearealsoanincreasingnumberofcommercialparalversity[HTZ+97],LooPofromtheUniversityofPassau[GL97],andPAFfromthe fromstanforduniversity[h+96],themccat/earth-ccompilerfrommcgilluni- thisthesisaddressesbothprogramanalysisandsourcetosourceprogramtransformation. Wearemostlyinterestedinautomaticandsemi-automaticparallelizationtechniques: tionswhichimprovesoneorseveralrun-timeparameters.toapplyaprogramtransfor- mationatcompile-time,onemustcheckthatthealgorithmimplementedbytheprogram Optimizationsandparallelizationsareusuallyseenassourcetosourcecodetransforma- 1.1 ProgramAnalysis isunharmedduringtheprocess.becauseanalgorithmcanbeimplementedinmanydifferentways,applyingaprogramtransformationrequires\reverseengineering"themost preciseinformationaboutwhattheprogramdoes.thisfundamentalprogramanalysistechniqueaddressesthedicultproblemofgatheringcompile-time a.k.a.static informationaboutrun-time a.k.a.dynamic properties.
56 1.1.PROGRAMANALYSIS StaticAnalysis 55 arecalledstaticbecausetheycovereverypossiblerun-timeexecutionleadingtoagiven twoinstructions.thesemachinestatesareknownasprogrampoints.suchproperties programpoint.ofcoursethesepropertiesarecomputedatcompile-time,butthisisnot Programanalysesoftencomputepropertiesofthemachinestatebetweenexecutionof Muc97,ASU86,JM82,KS92,SRH96],onemayexposethefollowingcommonissues.To themeaningofthe\static"adjective:\syntactic"wouldprobablybemoreappropriate... formallystatethepossiblerun-timeexecutions,theusualmethodistobuildthecontrol analyses.amongthevariouswordingsandformalpresentationsofthisframework[ku77, Data-owanalysisistherstproposedframeworktounifythelargenumberofstatic owgraphoftheprogram[asu86];indeed,thisgraphrepresentsallprogrampointsas allpossibleexecutionsisthenthesetofallpathsfromtheinitialstatetotheconsidered nodes,andedgesbetweenthesenodesarelabeledwithprogramstatements.thesetof programpointandmeetallinformationsalongthesepaths.theformalstatementofthese eachstatementmaymodifysomeproperty,onemustconsidereverypathleadingtothe ideasisusuallycalledmeetoverallpaths(mop)[ks92].ofcourse,themeetoperation programpoint.propertiesatagivenprogrampointaredenedasfollows:because dependsonthepropertytobeevaluatedandonitsmathematicalabstraction. oftheproblemcannotbeusedforpracticalevaluationofstaticproperties.practical alongedgesofthecontrolowgraph.aniterativeresolutionofthepropagationequations computationisdoneby forwardorbackward propagationoftheintermediateresults However,becauseofthepossiblyunboundednumberofpaths,theMOPspecication (MFP).Intheintra-proceduralcase,KamandUllman[KU77]haveproventhatMFP isperformed,untilax-pointisreached.thismethodisknownasmaximalxedpoint somesimplepropertiesofthemathematicalabstractionaresatised;andthisresulthas beenextendedtointer-proceduralanalysisbyknoopandsteen[ks92]. eectivelycomputestheresultdenedbymop i.e.mfpcoincideswithmop when theapplicationandcomplexityoftheanalysis.thelatticestructureencompassesmostabstractionsbecauseitsupportscomputationofbothmeet atmergepoints andjoin at Mathematicalabstractionsforprogrampropertiesareverynumerous,dependingon computationalstatements operations.inthiscontext,cousotandcousot[cc77]have ematicalformulationcalledabstractinterpretationhastwomaininterests:rstitallows systematicapproachestotheconstructionofalatticeabstractionforprogramproperties, concreterun-timestatesofaprogramandabstractcompile-timeproperties.thismath- proposedanapproximationframeworkbasedonsemi-dualgaloisconnectionsbetween Whileextendingtheconceptofdata-owanalysis,abstractinterpretationhelpsproving thecorrectnessandoptimalityofprogramanalyses.practicalapplicationsofabstractinterpretationandrelatediterativemethodscanbefoundin[cou81,ch78,deu92,cre96]works,theautomaticparallelizationcommunityhasveryrarelybaseditsanalysistechniquesononeoftheseframeworks.beyondtheimportantreasonswhicharenotofa scienticnature,wewilldiscussthegoodreasons: Despitetheundisputablesuccessesofdata-owandabstractinterpretationframe- toaconservativeapproximationofanactualx-pointinthelatticeofconcretestates. andsecond,itensuresthatanycomputedx-pointintheabstractlatticecorresponds MOP/MFPtechniquesfocusonclassicaloptimizationstechniques,withrathersimpleabstractions(latticesoftenhaveaboundedheight);correctnessandeciencyin aproductioncompilerarethemainmotivations,whereasprecisionandexpressive-
57 56 nessofthemathematicalabstractionarethemainissuesforparallelization; CHAPTER1.INTRODUCTION intheindustry,parallelizationhastraditionallyaddressednestsofloopsandarrays, issuesofcriticalinterest; applicationstorealprogramsandpracticalimplementationinacompilerbecome withhighdegreesofdataparallelismandsimple(nonrecursive,rstorder)control structures;provingthecorrectnessofananalysisiseasyinthiscontext,whereas abstractinterpretationiswellsuitedtofunctionallanguageswithcleanandsimple issuesofimperativeandlow-levellanguagessuchasfortranorc,traditionallymore operationalsemantics;problemsraisedinthiscontextareorthogonalwithpractical staticanalysistechniques,whichcomputepropertiesatagivenprogrampointorstatement.suchresultsarewellsuitedtomostclassicaltechniquesforprogramcheckingand oneneedsmoreinformation. Whataboutdistinctrun-timeinstancesofprogrampointsandstatements?Because Asaresult,data-owandabstractinterpretationframeworkshavemostlyfocusedon suitableforparallelarchitectures(butwewillseethatthispointisevolving). optimization[muc97,asu86,skr90,krs94],butforautomaticparallelizationpurposes, Whataboutdistinctelementsinadatastructure?Becausearraysanddynamically statementsarelikelytoexecuteseveraltimes,weareinterestedinwhichiteration ofalooporwhichcalltoaprocedureinducedexecutionofsomeprogramstatement. allocatedstructuresarenotatomic,weareinterestedinwhicharrayelementor lelizationcommunities,itisnotsurprisingthatresultsoftheonescouldnotbeappliedby Becauseoforthogonalinterestsinthedata-owanalysisandtheautomaticparal- whichgraphnodeisaccessedbysomerun-timeinstanceofastatement. theothers.indeed,averysmallnumberofdata-owanalyses[dgs93,tzo97]addressed InstancewiseAnalysis bothinstancewiseandelementwiseissues,butresultsareveryfarfromtherequirements ofacompilerintermsofprecisionandapplicability. Theprogrammodelconsideredisalsomorerestricted mostofthetime sincetraditional applicationsofparallelizingcompilersarenumericalcodeswithloopnestsandarrays. tothebroadrangeofpropertiesandtechniquesstudiedindata-owanalysisframeworks. Programanalysesforautomaticparallelizationarearatherrestricteddomain,compared Feautrier[Fea88a] analysesareorientedtowardsinstancewiseandelementwisepropertiesofprograms.whentheonlycontrolstructurewasthefor/doloop,iterativemethods withahighsemanticalbackgroundseemedoverlycomplex.tofocusonsolvingcritical Sincetheverybeginning withworksbybanerjee[ban88],brandes[bra88]and problemssuchasabstractingloopiterationsandeectsofstatementinstancesonarray elements,designingsimpleandad-hocframeworkswasobviouslymoreprotablethan statementinstanceswhichaccessthesamememorylocation,oneoftheaccessesbeinga write.moreprecisemethodshavebeendesignedtocompute,foreveryarrayelementread tests[ban88]anddependenceanalyses[bra88,pug92]whichcollectedinformationabout tryingtobuildonunpracticaldata-owframeworks.therstanalysesweredependence inanexpression,theverystatementinstancewhichproducedthevalue.theyareusually calledarraydata-owanalyses[fea91,mal93],butweprefertocalltheminstancewise
58 1.2.PROGRAMTRANSFORMATIONSFORPARALLELIZATION reachingdenitionanalysesforbettercomparisonwithaspecicstaticdata-owanalysis 57 techniquecalledreachingdenitionanalysis[asu86,muc97].suchaccurateinformationsignicantlyimprovesthequalityofprogramtransformationtechniques,hencethe subscripts,andwithoutprocedurecalls.thisverylimitedmodelisalreadysucient usedtobenestedloopswithoutconditionalstatements,withaneboundsandarray performanceofparallelprograms. toaddressmanynumericalcodes,andhasthemajorinterestofallowingcomputation Instancewiseanalyseshavelongsueredstrongprogrammodelrestrictions:programs conservativeapproximationsofreachingdenitioninformation.adirectcomputationof dicultiesinremovingtherestrictionsisthatexactresultscannotbehopedforanymore, andonlyapproximatedependencesareavailableatcompile-time:thisinducesoverly ofexactdependenceandreachingdenitioninformation[fea88a,fea91].oneofthe andfeautrier[cbf95,bcf97,bar98]andbypughandwonnacott[wp95,won95].in andextremelypreciseintra-proceduraltechniqueshavebeendesignedbybarthou,collard thefollowing,fuzzyarraydataowanalysis(fada)bybarthou,collardandfeautrier reachingdenitionsisthusneeded.recently,suchdirectcomputationshavebeencrafted, CI96],buttheyarenotfullyinstancewiseinthesensethattheydonotdistinguishbe- [Bar98]willbeourpreferedinstancewisereachingdenitionanalysisforprogramswith unrestrictednestedloopsandarrays. tweenmultipleexecutionsofastatementassociatedwithdistinctcallsofthesurround- ingprocedure.indeed,therstfullyinstancewiseanalysisforprogramswith possibly Manyextensionstohandleprocedurecallshavebeenproposed[TFJ86,HBCM94, ofthesetransformationswillbestudiedinmoredetailintherestofthisthesis.ofcourse, recursive procedurecallsispresentedinthisthesis. theyarebasedoninstancewiseandelementwiseanalysisofprogramproperties. Thenextsectionintroducesprogramtransformationsusefultoparallelization.Most cientcompilationonmodernprocessorsorsupercomputers.ageneralmethodtoreduce Dependencesareknowntohamperparallelizationofimperativeprogramsandtheire- 1.2 ProgramTransformationsforParallelization cessesinprograms.classicalwaysincluderenamingscalars,arraysandpointers,splitting ingdistinctmemorylocationstoindependentwrites,i.e.toexpanddatastructures. thenumberofmemory-baseddependencesistodisambiguatememoryaccessesinassign- newdimensions,convertingarraysintotrees,changingthedegreeofatree,andchanging ormergingdatastructuresofthesametype,reshapingarraydimensions,includingadding Therearemanywaystocomputememoryexpansions,i.e.totransformmemoryac- toimplementtheexpandedreference[fea91].figure1.1showsthreeprogramswithno aglobalvariableintoalocalone. possibleparallelexecutionbecauseofoutputdependences(detailsofthecodeareomitted whennotusefulforpresentation).expandedversionsaregivenintheright-handsideof Readreferencesarealsoexpanded,usinginstancewisereachingdenitioninformation thegure,toillustratethebenetofmemoryexpansionforparallelismextraction. similar butnotidentical tothoseofthestaticsingle-assignment(ssa)frameworkby to\merge"datadenitionsduetoseveralincomingcontrolpaths.thesefunctionsare timecomputationisneededtopreservetheoriginaldataow:functionsmaybeneeded Unfortunately,whenthecontrol-owcannotbepredictedatcompile-time,somerun- Cytronetal.[CFR+91],andhavebeenrstextendedforinstancewiseexpansionschemes
59 58 CHAPTER1.INTRODUCTION... intx; x=;=x; x=;=x; intx1,x2; x1=;=x1; x2=;=x2; Afterexpansion,i.e.renamingxinx1andx2,thersttwostatementscanbeexecuted inparallelwiththetwoothers. inta[10]; for(i=0;i<10;i++){ s1a[0]=; for(j=1;j<10;j++){ s2 A[j]=A[j-1]+; } inta1[10],a2[10][10]; for(i=0;i<10;i++){ s1a1[i]=; for(j=1;j<10;j++){ s2 A2[i][j]={if(j=1)A1[i]; elsea2[i][j-1];}+; } Afterexpansion,i.e.renamingarrayAinA1andA2thenaddingadimensiontoarray A2,theforloopisparallel.TheinstancewisereachingdenitionoftheA[j-1]reference dependsonthevaluesofiandj,asimplementedwithaconditionalexpression. inta[10]; voidproc(inti){ A[i]=; =A[i]; if()proc(i+1); if()proc(i-1); } structtree{ intvalue;tree*left,*right; }*p; voidproc(tree*p,inti){ p->value=; =p->value; if()proc(p->left,i+1); if()proc(p->right,i-1); } Afterexpansion,thetwoprocedurecallscanbeexecutedinparallel.Memoryallocation forthetreestructureisnotshown....figure1.1.simpleexamplesofmemoryexpansion... bycollardandgriebl[gc95,col98].theargumentofafunctionisthesetofpossible reachingdenitionsfortheassociatedreadreference.1figure1.2showstwoprograms withsomeunknownconditionalexpressionsandarrayssubscripts.expandedversions withfunctionsaregivenintherightsideofthegure. Noticethatmemoryexpansionisnotamandatorystepforparallelization;itisyeta generaltechniquetoexposeparallelisminprograms.now,implementationofaparallel programdependsonthetargetlanguageandarchitecture.twomaintechniquesareused. Thersttechniquetakesbenetofcontrolparallelism,i.e.parallelismbetweendifferentstatementsinthesameprogramblock.Itsgoalistoreplaceasmanysequential executionsofstatements denotedwith;inc byparallelexecutions.dependingon thelanguage,therearemanydierentsyntaxestocodethiskindofparallelism,andall thesesyntaxesmaynothavethesameexpressivepower.wewillpreferthecilk[mf98] spawn/syncsyntax(similartoopenmp'ssyntax)totheparallelblocknotationfrom Algol68ortheEARTH-Ccompiler[HTZ+97].Asin[MF98],synchronizationsinvolve 1ThisinterpretationoffunctionsisverydierentfromtheirusualsemanticsintheSSAframework.
60 1.2.PROGRAMTRANSFORMATIONSFORPARALLELIZATION intx; s1x=; s2if()x=; r=x; intx1,x2; s1x1=; s2if()x2=; r=(fs1;s2g); Afterexpansion,onemaynotdecideatcompile-timewhatvalueisreadbystatement r.oneonlyknowsthatitmayeithercomefroms1orfroms2,andtheeectivevalue retrievalcodeishiddeninthe(fs1;s2g)function.itcheckswhethers2executedornot, thenifitdid,itreturnsthevalueofx2,elseitreturnsthevalueofx1. inta[10]; s1a[i]=; s2a[]=; r=a[i]; inta1[10],a2[10]; s1a1[i]=; s2a2[]=; r=(fs1;s2g); Afterexpansion,onemaynotdecideatcompile-timewhatvalueisreadbystatementr, becauseonedoesnotknowwhichelementofarrayaisassignedbystatements2....figure1.2.run-timerestorationoftheowofdata... everyasynchronouscomputationstartedinthesurroundingprogramblock,andimplicit synchronizationsareassumedatreturnpointsinprocedures.fortheexampleinfigure1.3.a,executionofa,b,cinparallelfollowedsequentiallybydandehasbeen writteninacilk-likesyntax(eachstatementwouldprobablybeaprocedurecall).... spawna; spawnb; spawnc; sync; //waitfora,bandctocomplete D; E; Figure1.3.a.Controlparallelism //Listhelatencyoftheschedule for(t=0;t<=l;t++){ parallelfor({2f(t)) executeinstance{ //implicitsynchronization }Figure1.3.b.Dataparallelimplementationfor schedules...figure1.3.exposingparallelism... Thesecondtechniqueisbasedondataparallelism,i.e.parallelismbetweendierent instancesofthesamestatementorblock.thedataparallelprogrammingmodelhas beenextensivelystudiedinthecaseofloopnests[pd96],becauseitisverywellsuited toecientparallelizationofnumericalalgorithmsandrepetitiveoperationsonlargedata sets.wewillconsiderasyntaxsimilartoopenmpparallelloopdeclaration,whereall variablesaresupposedtobesharedbydefault,andanimplicitsynchronizationtakes placeateachparallellooptermination. Therstalgorithmstogeneratedataparallelcodewerebasedonintuitivelooptransformationssuchasloopssion,loopfusion,loopinterchange,loopreversal,loopskewing, loopreindexingandstatementreordering.moreover,dependencesabstractionsweremuch lessexpressivethananerelations.butdataparallelismisalsoappropriatewhendescribingaparallelorderwithaschedule,i.e.givinganexecutiondateforeverystatement
61 instance.theprogrampatterninfigure1.3.bshowsthegeneralimplementationofsuch 60 CHAPTER1.INTRODUCTION manyothermethodshavebeendesigned.theseareallbasedonaratherapproximative aschedule[pd96].itisbasedontheconceptofexecutionfrontf(t)whichgathersall instances{executingatdatet. ofgenerality,thebenetofsuchmethodsisthelowcomplexityandeasyimplementation abstractionsofdependences,likedependencelevels,vectorsandcones.despitethelack TherstschedulingalgorithmwasdesignedbyAllenandKennedy[AK87],fromwhich inaindustrialparallelizaingcompiler;seetheworkbybanerjee[ban92]ormorerecently bydarteandvivien[dv97]forasurveyofthesealgorithms. (onadistributedmemorymachine),thewidthofthefronts? Theproposedalgorithmisveryuseful,butitsweakpointisthelackofhelptodecidewhat parameterofthescheduletooptimize:isitthelatencyl,thenumberofcommunications TherstgeneralsolutiontotheschedulingproblemwasproposedbyFeautrier[Fea92]. forwhichthedistinctionbetweenthetwoparadigmsbecomesveryunclear,asshownin lelism,meaningthateverydataparallelprogramcanberewritteninacontrolparallel model,withoutloosinganyparallelism.thisisespeciallytrueforrecursiveprograms, Eventually,itiswellknownthatcontrolparallelismismoregeneralthandataparal- thatarchitecturesformassivelyparallelcomputationsweremuchmoresuitedtodata trolparallelism mainlyduetoasynchronoustaskmanagementoverhead.butrecent [Fea98].However,forpracticalprogramsandarchitectures,ithaslongbeenthecase parallelism,andthatgettinggoodspeed-upsonsucharchitectureswasdicultwithcon- advancesinhardwareandsoftwaresystemsareshowinganevolutioninthissituation: algorithms)havebeenshownwithcilkforexample[mf98]. excellentresultsforparallelrecursiveprograms(gamesimulationslikechess,andsorting generalframeworkforprogramanalysisandtransformation,andpresentstheformaldenitionsusefultothefollowingchapters.themaininterestofthischapteristoencompass Thisthesisisorganizedinfourchaptersandanalconclusion.Chapter2describesa 1.3 ThesisOverview datastructures. averylargeclassofprograms,fromnestsofloopswitharraystorecursiveprogramsand andtheothersaremostlycontributions,suchasleft-synchronoustransductionsandapproximationtechniquesforrationalandalgebraictransductionstensionoftheinductionvariableconcepttorecursiveprogramsandonnewresultsin analysis.thesealgorithmsareappliedtoseveralpracticalexamples. Chapter4addressesinstancewiseanalysisofrecursiveprograms.Basedonanex- known,suchaspresburgerarithmetcisandformallanguagetheory;someareveryuncommonincompilerandparallelismelds,suchasrationalandalgebraictransductions; AcollectionofmathematicalresultsisgatheredinChapter3;someareratherwell formallanguagetheory,itpresentstwoalgorithmsfordependenceandreachingdenition tionals,boundsandarraysubscripts;thefourthsectionisacontributiontosimultaneous optimizationofexpansionandparallelizationparameters;andthefthsectionpresents rstthreesectionspresentnewtechniquestoexpandnestedloopswithunrestrictedcondi- ParallelizationtechniquesbasedonmemoryexpansionarestudiedinChapter5.The ourresultsaboutparallelizationofrecursiveprograms.
62 61 Chapter2 Framework andapproaches.eachonehasbeenstudiedbymanyauthorswhohavedenedtheir ownvocabularyandabstractions.ofcourse,wewouldliketokeepthesameformalism Thepreviousintroductionandmotivationhascoveredseveralverydierentconcepts alongthewholepresentation.thischapterpresentsaframeworkfordescribingprogram analysisandtransformationtechniquesandforprovingtheircorrectnessortheoretical properties.thedesignofthisframeworkhasbeengovernedbythreemajorgoals: 1.buildonwelldenedconceptsandvocabulary,whilekeepingthecontinuitywith 2.focusoninstancewisepropertiesofprograms,andtakebenetofthisadditional informationtodesignnewtransformationtechniques; relatedworks; 3.headforbothgeneralityandhighprecision,minimizingthenecessarynumberof rootedinsemanticallyandmathematicallysoundtheories[ku77,cc77,jm82,ks92]. Thispresentationdoesnotcompetewithotherformalisms,someofwhicharermly tradeos. theory.wearesurethatinstancewiseanalysiscanbemodeledinaformalframeworksuch Becauseweadvocateforinstancewiseanalysisandtransformations,weprimarilyfocused onestablishingconvincingresultsabouteectivenessandfeasibility.thisrequiredleaving asabstractinterpretation,evenifveryfewworkshaveaddressedthisimportantissue. forfurtherstudiesthenecessaryintegrationofourtechniquesinamoretraditionalanalysis executionsinsection2.1,thentheprogrammodelwewillconsiderthroughoutthisstudy isexposedandmotivatedinsection2.2.section2.3proposesmathematicalabstractions fortheseinstanceandprogrammodels.programanalysisandtransformationframeworks Westartwithaformalpresentationofrun-timestatementinstancesandprogram 2.1 areaddressedinsections2.4and2.5respectively. thesurroundingcontrolstructures(loops,procedurecallsandconditionalexpressions). Duringprogramexecution,eachstatementcanbeexecutedseveraltimes,dependingon GoingInstancewise Denition2.1(instance)Forastatements,arun-timeinstanceofsissomeparticular techniquesshouldbeabletodistinguishbetweenthedistinctexecutionsofastatement. Tocapturedata-owinformationaspreciselyaspossible,ouranalysisandtransformation executionofsduringexecutionoftheprogram.
63 62 CHAPTER2.FRAMEWORK Forshort,arun-timeinstanceofastatementiscalledaninstance.Iftheprogramterminates,eachstatementhasanitenumberofinstances. ConsiderthetwoexampleprogramsinFigure2.1.Theybothdisplaythesumofan arrayawithanunknownnumbernofelements;oneisimplementedwithaloopand theotherwitharecursiveprocedure.statementsbandcareexecutedntimesduring executionofeachprogram,butstatementsaanddareexecutedonlyonce.thevalue ofvariableicanbeusedto\name"eachinstanceofbandcandtodistinguishat compile-timebetweenthe2n+2run-timeinstancesofstatementsa,b,candd:the uniqueinstancesofstatementsaanddaredenotedrespectivelybyhaiandhci,andthe NinstancesofstatementB(resp.statementC)associatedwithsomevalueiofvariable iaredenotedbyhb;ii(resp.byhc;ii),0i<n.suchan\iterationvariable"notation isnotalwayspossible,andageneralnamingschemewillbestudiedinsection inta[n]; intc; Ac=0; for(i=0;i<n;i++){ B c=c+a[i]; }printf("%d",c); inta[n]; intsum(inti){ if(i<n) C returna[i]+sum(i+1); else D return0; }printf("%d",sum(0));...figure2.1.aboutrun-timeinstancesandaccesses... Becauseofthestateofmemoryandpossibleinteractionswithitsenvironment,several executionsofthesameprogrammayyielddierentsetsofrun-timestatementinstances andincompatibleresults.wewillnotformallydenethisconceptofprogramexecution inoperationalsemantics:averycleanframeworkhasindeedbeendenedbycousot andcousot[cou81]forabstractinterpretation,butthecorrectnessofouranalysisand transformationtechniquesdoesnotrequiresomanydetails. Denition2.2(programexecution)LetPbeaprogram.Aprogramexecutioneis givenbyanexecutiontraceofp,whichisaniteorinnite(whentheprogramdoes notterminate)sequenceofcongurations i.e.machinestates.thesetofallpossible programexecutionsisdenotedbye. Now,thesetofallrun-timeinstancesforagivenprogramexecutione2Eisdenoted byie.subscriptedenotesagivenprogramexecution,butitalsorecallsthatsetie is\exact":itistheeectiveunapproximatesetofstatementinstancesexecutedduring programexecutione.thisformalismwillbeusedineveryfurtherdenitionofexecutiondependentconcept. ConsideringagainthetwoprogramsinFigure2.1,theexecutionofstatementsBandC isgovernedbyacomparisonofvariableiwiththeconstantn.withoutanyinformation onthepossiblevaluesofn,itisimpossibletodecideatcompile-timewhethersome instanceofborcexecutes.intheextremecaseofanexecutionewherenisequal tozero,bothstatementsareneverexecuted,andthesetieisequaltofhai;hdig.in general,ieisequaltofhai;hdig[fhb;ii;hc;ii:0i<ng,thevalueofnbeingpart ofthedenitionofe.
64 2.2.PROGRAMMODEL Ofcourse,eachstatementcaninvolveseveral(includingzero)memoryreferences,at 63 mostoneofthesebeingawrite(i.e.inleft-handside). Denition2.3(access)Apair({;r)ofastatementinstanceandareferenceinthe canbedecomposedinto: Foragivenexecutione2Eofaprogram,thesetofallaccessesisdenotedbyAe.It statementiscalledanaccess. andwe,thesetofallwrites,i.e.accessesperformingsomestoreoperationinto Re,thesetofallreads,i.e.accessesperformingsomeloadoperationfrommemory; write.sinceastatementperformingsomewriteinmemoryinvolvesexactlyonereference Duetooursyntacticalrestrictions,noaccessmaybesimultaneouslyareadanda memory. simpliestheexposition). inleft-handside,itsinstancesareoftenusedinplaceofitswriteaccesses(thissometimes statementahasonewritereferencetovariablec,thesingleassociatedaccessis LookingagainatourtwoprogramsinFigure2.1: statementbhasonewriteandonereadreferencetovariablec,sincebothreferences areidentical,theassociatedaccessesarebothdenotedbyhb;i;ci,0i<n; denotedbyha;ci; statementbhasonereadreferencetoarraya,theassociatedaccessesaredenoted statementchasonereadreferencetoarraya,theassociatedaccessesaredenoted byhb;i;a[i]i,0i<n; statementdhasnomemoryreference,thusnoassociatedaccess. byhc;i;a[i]i,0i<n; 2.2 Ourframeworkfocusesonimperativeprograms.Thissectiondescribesthecontroland datastructuresyntaxweconsider.inapreliminarywork[ccg96],wedenedatoy ProgramModel programmodelwithac-likesyntax(withc++syntacticsugarfacilities):despitethethe shapesttingourprogrammodel.mostoftheprogrammodelrestrictionsweenumerate inthissectionwerealsoenforcedbythelanguagesemantics.wechoseyettodeneour language calledlegs whichallowedexplicitdeclarationofcomplexdatastructures ofpracticalexamplesandthecommunicationofournewideas. lackofformalsemanticsavailableinc,wehopethischoicewilleasetheunderstanding Proceduresareseenasfunctionsreturningthevoidtypeandexplicit typed pointers areallowed.multi-dimensionalarraysareaccessedwithsyntax[i1,:::,in] notc ControlStructures Denition2.4(statementandblock)AprogramstatementisanyCexpression syntax forbetterunderstanding. endedwith\;"or\}".aprogramblockisaspecialkindofstatementthatstarts
65 64with\{",afunctiondeclaration,alooporaconditionalexpression,andsurrounding CHAPTER2.FRAMEWORK handsideofanassignment,inafunctioncallorinaloopdeclarationareconditional oneormoresub-statements. aresupposedtofollowsomeminimal\codeofethics":eachloopvariableisaectedby statements.moreover,multipleexpressionsseparatedby,arenotallowed,andloops Tosimplifytheexposition,theonlycontrolstructuresthatmayappearintheright- variablemustbeinitialized. asingleloopanditsvalueisnotusedoutsideofthisloop;asaconsequence,eachloop higher-orderstructurescanbehandledconservatively,inapproximatingthepossiblefunctioncallsusingexternalanalysistechniques[cou81,deu90,har89,afl95].callsto callshouldbefullyspeciedatcompile-time,and\computed"gotosareforbidden.but Thisframeworkisprimarilydesignedforrst-ordercontrolstructures:anyfunction formationtechniques,possiblyyieldingincorrectparallelizations. input/outputfunctionsareallowedaswell,butcompletelyignoredbyanalysisandtransrithmsforstructuringprograms[bak77,amm92],atthecostofsomecodeduplicationistrictedpredicatesareallowed.classicalexceptionmechanisms,breaks,andcontinues aresupportedaswell.however,wesupposethatgotosareremovedbywellknownalgo- Recursivecalls,loopswithunrestrictedbounds,andconditionalstatementswithunre- therarecaseswherethecontrolowgraphisnotreducible[asu86]. Weonlyconsider scalars(boolean,integer,oating-point,pointer...); DataStructures records(non-recursiveandnon-arraystructurewithscalarandrecordelds); treesofscalarsorrecords; arraysofscalarsorrecords; arraysoftrees; strictedarrayvaluesintreesandtreeelementsinarraysareallowed,includingrecursive andtreesofarrays. nestingsofarraysandtrees. Recordsareseenascompoundscalarswithunaliasednamedelds.Moreover,unre- throughtheuseofexplicitpointers.however,tosimplifytheexposition,wesupposethat novariableissimultaneouslyusedasapointer(throughoperators*and->)andasan array(throughoperator[]):inparticular,explicitarraysubscriptsmustbepreferredto Arraysareaccessedthroughtheclassicalsyntax,andotherdatastructuresareaccessed pointerarithmetic. treedeclaration. Byconvention,edgenamesintreesareidenticaltothelabelofpointereldsinthe
66 2.3.ABSTRACTMODEL Inpracticalimplementations,recursivedatastructuresarenotmadeexplicit.More 65 precisely,twomainproblemsarisewhentryingtobuildanabstractviewofdatastructure denitionandusageincprograms. 1.Multiplestructuredeclarationsmayberelativetothesamedatastructure,with- compile-timeabstractionofdatastructuresusedinaprogramisthusadicult aslists,doubly-linkedlists,trees,acyclicgraphs,generalgraphs,etc.buildinga glerecursivestructdeclarationcandescribeseveralverydierentobjects,such outexplicitdeclarationoftheshapeofthewholeobject.moreover,evenasin- problem,butitisessentialtoouranalysisandtransformationframework.itcanbe basedstructures[gh96,srw96]. achievedintwooppositeways:either\decorating"theccodewithshapedescriptionswhichguidethecompilerwhenbuildingitsabstractviewofdatastructures [KS93,FM97,Mic95,HHN92]orrunningacompile-timeshapeanalysisofpointer- 2.Twopointervariablesmaybealiased,i.e.theymaybetwodierentnamesforthe andpoints-toanalysis[lrz93,egh94,ste96](store-based)techniquesisprecisely todisambiguatepointeraccesses,whenpointerupdatesarenottoocomplextobe samememorylocation.thegoalofaliasanalysis[deu94,cbc93,gh95](store-less) verycostlyandstillalargelyopenproblem:indeed,noinstancewisealiasanalysisfor analyzed.inpractice,onemayexpectgoodresultsforstronglytypedprograms pointershasbeenproposedsofar,anditcouldbeaninterestingfuturedevelopment whethertwopointersreferthesamestructureornot.element-wisealiasanalysisis withoutpointerarithmetics,especiallyifthegoalofthealiasanalysisistocheck associatedthedatastructureinstanceitrefersto. identiedasoneofthesupporteddatatypes,andthateachpointerreferencehasbeen Inthefollowing,wethussupposethattheshapeofeachdatastructurehasbeen ofourframework. dynamicallyeverytimeanout-of-boundaccessisdetected:thisisthecaseofsomeexpandedprogramsstudiedinchapter5.theproblemismorecriticalwithpointer-based availableinmostcases;butsomeprogramsrequiredynamicarrayswhosesizeisupdated modiedanddestroyed?whendealingwitharrays,acompile-timeshapedeclarationis Now,thereisonelastquestionaboutdatastructures:howaretheyconstructed, weconsiderthesameabstraction:alldatastructuresaresupposedtobybuilttotheir datastructures:theyaremostofthetimeallocatedatrun-timewithexplicitmallocor newoperations.thisproblemhasalreadybeenaddressedbyfeautrierin[fea98]and strictiontotheprogrammodel:anyrun-timeinsertionanddeletionisforbidden.infact therearetwoexceptionstothisverystrongrule,buttheywillbedescribedinthenext maximalextent possiblyinnite inapreliminarypartofthecode.toguaranteethat sectionafterpresentingthemathematicalabstractionfordatastructures.nevertheless,a thisabstractioniscorrectregardingdata-owinformation,wemustaddanadditionalre- lotofinterestingprogramswithrecursivepointer-basedstructuresperformrandominsertionsanddeletions,andtheseprogramscannotbehandledatpresentinourframework. Thisissueisleftforfuturework. 2.3 Westartwithapresentationofanamingschemeforstatementinstances,andshowthat executiontracesarenotsuitabletoourpurpose.then,weproposeapowerfulabstraction AbstractModel
67 formemorylocations. 66 CHAPTER2.FRAMEWORK Inthefollowing,everyprogramstatementissupposedtobelabeled.Thealphabetof statementlabelsisdenotedbyctrl.now,loopsandconditionalsrequiresspecialattention.becausealoopinvolvesaninitializationstep,aboundcheckstep,andaniteration step,loopsaregiventhreelabels:therstonerepresentstheloopentry,thesecond NamingStatementInstances conditionalstatement,andthetwootherarenon-blocklabels. immediatelyaftereachincrement.theloopcheckisconsideredasablockanda berthat,inc,aboundcheckisperformedimmediatelyaftertheloopentryand oneisthecheckfortermination,andthethirdoneistheloopiteration.remem- Anifthenelsestatementisgiventwolabels:oneforthecondition andthethenbranch,andonefortheelsebranch.bothlabelsareconsideredas putesallpossiblesolutionstothen-queensproblem,usinganarraya(detailsofthecode ConsidertheprogramexampleinFigure2.2.a.Thissimplerecursiveprocedurecom- blocklabels. areomittedhere);itisourrunningexampleinthissection. accessina.statementiandjareconditionals,andstatementqisarecursivecallto procedurequeens.loopstatementsaredividedintothreesub-statementswhicharegiven distinctlabels:therstonedenotestheloopentry e.g.aorb thesecondonedenotes Therearetwoassignmentstatements:swritesintoarrayAandrperformssomeread Finally,PisthelabeloftheprocedureandFdenotestheinitialcallinmain. theboundcheck e.g.aorb andthethirdonedenotestheloopiteration e.g.aorb. traces.theirinterpretationforprogramanalysisisgenerallydenedasapathfromthe entryofthecontrolowgraphtoagivenstatement.1theyrecordeveryexecutionofa instance.toachievethis,manyworksintheprogramanalysiseldrelyonexecution Aprimarygoalforinstancewiseanalysisandtransformationistonameeachstatement statement,includingreturnfromfunctions. 1.becauseofreturnlabels,tracesbelongtoanon-rationallanguageinctrl,assoon Forourpurpose,theseexecutiontraceshavethreemaindrawbacks: 2.full-lengthtracesarehugeandextremelyredundant:ifaninstanceexecutesbefore anotherinthesameprogramexecution,itstraceprexestheother; astherearerecursivefunctioncalls; 3.asinglestatementinstancemayhaveseveralexecutiontracesbecausestatement ctrl[har89]:intuitivelythisfunctioncollapsesallcall-returnpairsinagivenexecution Toovercometherstproblem,aclassicaltechniquereliesonafunctioncalledNeton executionisunknownatcompiletime. trace,yieldingcompactrationalsetsofexecutiontraces.thethirdpointismuchmore unpleasantbecauseitforbidstogiveauniquenametoeachstatementinstance.notice howeverthatdierentexecutiontracesforthesameinstancemustbeassociatedwith distinctexecutionsoftheprogram. 1Withoutnoticeofconditionalexpressionsandloopbounds.
68 2.3.ABSTRACTMODEL PvoidQueens(intn,intk){ IintA[n]; A=A=afor(inti=0;i<n;i++){ rb=b=bif(k<n){ Js if(){ for(intj=0;j<k;j++) Q =A[j]; }} A[k]=; } Queens(n,k+1); sj F}intmain(){ FPIAAaAaAJs sj sj P Q Figure2.2.a.ProcedureQueens } Queens(n,0); IAA FPIAAaAaAJQPIAABBr JrFB...Figure2.2.ProcedureQueensandcontroltree... Figure2.2.b.Controltree vocabularyforthisrepresentationhasbeendenedinpartsandwithseveralvariations behindournamingschemeforinstancesistoconsidersomekindof\extendedstack states"whereloopsareseenasspecialcasesofrecursiveprocedures.thededicated Oursolutionstartsfromanotherrepresentationoftheprogramow:theintuition in[cc98,coh99a,coh97,fea98]. Dependingonthenumberofiterationsoftheinnermostloop boundedbyk anexecutiontraceforthisrstinstancecanbeoneoffpiaabbjs,fpiaabbbbjs,fpiaabbbbbbjs, :::,FPIAABB(bB)kJs.Sincewewouldliketogiveauniquenametotherstinstanceof Letusstartwithanexample:therstinstanceofstatementsinprocedureQueens. s,allb,bandblabelsshouldintuitivelybeleftout.now,foragivenprogramexecution, innermostloopinexecutiontracesoftherstinstanceofstatements:thesingleword iterationsandprocedurecallsleadingtoit.toeachlistcorrespondsaword:theconcatenationofstatementlabels.thisispreciselywhatwegetwhenforgettingaboutthe anystatementinstanceisassociatedwithaunique(ordered)listofblockenterings,loop FPIAAJs.TheseconceptsareillustratedbythetreeinFigure2.2.b,tobedenedlater. Wenowformallydescribethesewordsandtheirrelationwithstatementinstances. Denition2.5(controlautomatonandcontrolwords)Thecontrolautomatonof theprogramisanite-stateautomatonwhosestatesarestatementsintheprogram andwhereatransitionfromastateqtoastateq0expressthatstatementq0occursin
69 68blockq.Suchatransitionislabeledbyq0.Theinitialstateisthestatementexecuted CHAPTER2.FRAMEWORK atthebeginningofprogramexecution,andallstatesarenal. Lemma2.1Iebeingthesetofstatementinstancesforagivenexecutioneofaprogram, theybuildarationallanguagelctrlincludedinctrl. Wordsacceptedbythecontrolautomatonarecalledcontrolwords.Byconstruction, thereisanaturalinjectionfromietothelanguagelctrlofcontrolwords. deneafunctionffromietonctrl listsofstatementslabels mappingstatement instancestotheirrespectivelistofblockenterings,loopiterationsandprocedurecalls. listofblockenterings,loopiterationsandprocedurecallsleadingtoit.wecanthus Proof:Anystatementinstanceinaprogramexecutionisassociatedwithaunique Consideraninstances{1ofastatements1andaninstance{2ofastatements2,and statements0maybesuchthat(f({);s)=(f({0);s0). pairofastatementsandaninstance{ofs,thisprovesthatnootherinstance{0ofa supposef({1)=f({2)=l.bydenitionoff,bothstatementss1ands2mustbepart ofthesameprogramblockb,andprecisely,thelastelementoflisb.consideringa astatementstotheconcatenationofalllabelsinf({)andsitself.thankstothe precedingpropertyonpairs(f({);s),function Considerafunction fromietolctrl controlwords whichmapsaninstance{of Theorem2.1LetIbetheunionofallsetsofstatementinstancesIeforeverypossible executioneofaprogram.thereisanaturalinjectionfromitothelanguagelctrlof isinjective. proofoflemma2.1isdenotedby controlwords. Proof:Considertwoexecutionse1ande2ofaprogram.Thefunctiondenedinthe instance{ispartofbothie1andie2ofaprogram,controlwords thesame,becausethelistofblockenterings,loopiterationsandfunctioncallsleading to{areunchanged.lemma2.1terminatestheproof. 1forexecutione1and 2forexecutione2.Ifan 1({)and 2({)are general,theseteofpossibleprogramexecutionsandthesetiefore2eareunknown Wearethusallowedtotalkabout\thecontrolwordofastatementinstance".In atcompile-time,andwemayconsiderallinstancesthatmayexecuteduringanyprogramexecution.eventually,thenaturalinjectionbecomesaone-to-onemappingwhen aconsequence,ifwisacontrolword,wewillsay\instancew"insteadof\theinstance whosecontrolwordisw". extendingthesetiewithallpossibleinstancesassociatedto\legal"controlwords.as access\inside"thecontrolword:wethusextendthealphabetofstatementlabelsctrl solutionconsistsinconsideringpairs(w;ref),wherewisacontrolwordforsomeinstance ofastatementsandrefisareferenceinstatements.butweprefertoencodethefull Wearealsointerestedinencodingaccessesthemselveswithcontrolwords.Asimple lastletterinacontrolwordwisoftheformsref,itmeansthatwrepresentsanaccess insteadofaninstance.however,whenclearfromthecontext,i.e.whenthereisonlyone course,extendedlabelsmayonlytakeplaceasthelastletterinacontrolword:whenthe withlettersoftheformsref,forallstatements2ctrlandreferencerefins.of \interesting"referenceinagivenstatementorallreferencesareidentical,thereference willbetakenoutofthecontrolwordofaccesses.thiswillbethecaseinmostpractical examples.
70 2.3.ABSTRACTMODEL Eventually,noticethatsomestatesinthecontrolautomatonhaveexactlyoneincomingtransitionandoneoutgoingtransition(loopingtransitionscountasbothincoming 69 trolautomatonwhereallstateswithexactlyoneincomingtransitionandoneoutgoing canbereachedfromorleadto:ineverycontrolword,thelabeloftheoutgoingtransition andoutgoing).now,thesestatesdonotcarryanyinformationaboutwhereastatement transitionareremoved.thistransformationhasnoimpactoncontrolwords. followsthelabeloftheincomingone.inpractice,weoftenconsideracompressedcon- withrecursivecalls. pressedcontrolautomaton,andthatcyclesinvolvingmorethanonestateareassociated... Observethatloopsintheprogramarerepresentedbyloopingtransitionsinthecom- FP FP IAA IAA PFP BB B r JJ s Q a P r sqaa A IAA B r rbb J QP bb aa J bb Figure2.3.b.Compressedcontrolautomaton Figure2.3.a.Controlautomaton...Figure2.3.ControlautomataforprogramQueens... ThecompressedautomatonisdescribedinFigure2.3.b. F,I,A,B,Q,aandbareuseless,theyareremovedalongwiththeiroutgoingedges. Figure2.3.adescribestheplaincontrolautomatonforprocedureQueens.2Sincestates thestateassociatedtothisstatementastheonlynalone. controlwordstoinstancesofaparticularstatement.thisiseasilyachievedinchoosing Asapracticalremark,noticethatitisoftendesirabletorestrictthelanguageof tocomparetheexecutiontracesofaninstance{andthecontrolwordof{.indeed,the 2Everystateisnal,butthisisnotmadeexplicitonthegure. Toconcludethispresentationofanamingschemeforstatementinstances,itispossible
71 followingpropertyisquitenatural:itresultsfromtheobservationthattracesofan 70 CHAPTER2.FRAMEWORK Proposition2.1Thecontrolwordofastatementinstanceisasub-wordofeveryexecutiontraceofthisinstance. instancemayonlydierinlabelsofstatementsthatarenotpartofthelistofblock enterings,loopiterationsandfunctioncallsleadingtothisinstance Thesequentialexecutionorderoftheprogramdenesatotalorderoverinstances,callit <seq.inenglish,wordsareorderedbythelexicographicordergeneratedbythealphabet SequentialExecutionOrder ordera<b<c<.similarly,inanyprogramonecandeneapartialtextualorder <txtoverstatements:statementsinthesameblockaresortedinapparitionorder,and statementsappearingindierentblocsaremutuallyincomparable. mentsinsidetheloopbody,butentryandchecklabelsarenotcomparablewiththese statements.forprocedurequeensinfigure2.2.a,wehaveb<txtj<txta,r<txtb ands<txtq. Rememberthespecialcaseofloops:theiterationlabelexecutesafterallthestate- Thistextualordergeneratesalexicographiconeoncontrolwords,denotedby<lex: Thisorderisonlypartialonctrl.However,byconstructionofthetextualorder: w0<lexw()9x;x02ctrl;u;v;v02ctrl:w=uxv;w0=ux0v0;x0<txtx _ 9v02ctrl:w=w0v (a.k.a.prexorder): Proposition2.2Aninstance{0executesbeforeaninstance{itheirrespectivecontrol wordsw0andwsatisfyw0<lexw. elsecasesofthesameifinstanceareneversimultaneouslyexecutedinasingleexecution. conditionalarenotcomparable!thisdoesnotyieldacontradiction,becausethethenand Ingeneral,thelexicographicorderistotalonthesubsetofcontrolwordscorresponding Noticethatthelexicographicorder<lexisnottotalonLctrlbecausebothcasesona toinstancesthatdoexecute inone-to-onemappingwithieforsomeexecutione2e. Consideracontrolwordux,u2ctrlandx2ctrl;everydownwardedgefromanode rootisnamed"andeveryedgeislabeledbyastatement.eachnodethencorresponds tothecontrolwordequaltotheconcatenationofedgelabelsstartingfromtheroot. Eventually,thelanguageofcontrolwordsisbestunderstoodasaninnitetree,whose whosecontrolwordisuxcorrespondstoanoutgoingtransitionfromstatexinthecontrol functionallanguagescommunity,butcontroltreeismoreadequateinthepresenceofloops torightaccordingtothetextualorder.suchatreeisusuallycalledacalltreeinthe automaton.torepresentthelexicographicorder,downwardedgesareorderedfromleft controltrees,dependendingonthecontrolautomatonwhichdenesthem. andothernon-functionalcontrolstructures.onemaytalkaboutplainandcompressed isapossiblerun-timeinstanceofstatementr depictedbyastarinfigure2.2.b,and controlwordfpiaaaaaajs depictedbyablacksquare isapossiblerun-timeinstance onewillbestudiedlaterinfigure4.1page124).controlwordfpiaaaaaajqpiaabbr ApartialcontroltreeforprocedureQueensisshowninFigure2.2.b(acompressed ofstatements.
72 2.3.ABSTRACTMODEL AdressingMemoryLocations 71 Griebl[CCG96],butitisalsohighlyrelevanttopreviousworkbyAlabauandVauquelin wealreadyproposed[cc98,coh99a,coh97,fea98]someofwhichincollaborationwith programanalysis.thispresentationcanbeseenasanextensionofseveralframeworks Alargenumberofdatastructureabstractionshavebeendesignedforthepurposeof multi-dimensionalones.treeadressesareconcatenationofedgenames(seesection2.2.2) andhilnger[lh88]. [Ala94],byGiavitto,MichelandSansonnet[Mic95],byDeutsch[Deu92]andbyLarus startingfromtheroot.theaddressoftherootissimply",thezero-lengthword.for example,thenameofnoderoot->l->rinabinarytreeislr.thesetofedgenamesis Withnosurprise,arrayelementsareaddressedbyintegers,orvectorsofintegersfor denotedbydata.thelayoutoftreesinmemoryisthusdescribedbyarationallanguage Ldatadataoveredgenames. whichcapturesrelationsbetweenintegervectors,betweenwords,andbetweenthetwo. Dealingwithtreesonly,Feautrierproposedtouserationaltransductionsbetweenfree monoidsin[fea98].wewillformallydenesuchtransductionsinsection3.3,andthen Forthepurposeofdependenceanalysis,wearelookingforamathematicalabstraction monoids,tohandlearraysandnestedtreesandarraysaswell. showhowthesameideacancanbeextendedtomoregeneralclassesoftransductionsand ExtendingtheDataStructureModel arereferencetotheparentandlinksbetweennodesatthesameheightinatree.such Someinterestingstructuresarebasicallytreestructuresenhancedwithtraversaledges. Inmanycases,thesetraversaledgeshaveaveryregularstructure.Mostusualcases traversaledgesareoftenusedtofacilitatespecial-purposetraversalalgorithms.there issomesupportforsuchstructureswhentraversaledgesareknownfunctionsofthe generatorsofthetreestructure[ks93,fm97,mic95],i.e.the\back-bone"spanningtree afullchapterwouldbenecessaryandoursupportfortraversaledgesdoesnotinclude traversaledgesisnotsupported.wewillnotstudythisextensionanyfurtherbecause ofthegraph.insuchacase,traversaledgesaremerelyan\algorithmicalsugar"forbetter recursionanditeration. performance.buteventhough,oursupportislimitedsincerecursionanditerationover Thekeyideatohandlebotharraysandtreesisthattheyshareacommonmathematical AbstractMemoryModel abstraction:themonoid.foraquickrecallofmonoiddenitionsandproperties,see Section3.2.Indeedrationallanguages(treeaddresses)aresubsetsoffreemonoidswith wordconcatenation,andsetsofintegervectors(arraysubscripts)arefreecommutative monoidswithvectoraddition.themonoidabstractionforadatastructurewillbedenoted bymdata,andthesubsetofthismonoidcorrespondingtovalidelementsofthestructure willbedenotedbyldata. nessofmonoidabstractions.ourrstexampleisthehash-tablestructuredescribedin abstractionmdataforthisstructureisgeneratedbyz[fng,anditsbinaryoperation Figure2.4.Itdenesanarraywhoseelementsarepointerstolistsonintegers.Amonoid Thecaseofnestedarraysandtreesisabitmorecomplexbutrevealstheexpressive-
73 72... CHAPTER2.FRAMEWORK structkey{ };key*n; intvalue; //nextkey //valueofkey 2 18 key*hash[7]; isdenedasfollows:...figure2.4.hash-tabledeclaration... 8i2Z:in=in 8i2Z:ni=ni nn=nn (neverusedforthehash-table) (2.1) (2.2) ThesetLdataMdataofvalidmemorylocationsinthisstructureisthus 8i;j2Z:ij=i+j: (2.4) (2.3) CheckthatthethirdcaseinthedenitionofoperationisneverusedinLdata. OursecondexampleisthestructuredescribedinFigure2.5.Itdenesanarraywhose Ldata=Zn: elementsarereferencestootherarraysorintegers.eacharrayiseitherterminalwith integerelementsorintermediatewitharrayreferenceelements.thisdenitionisvery Mdataforthisstructureisthesameasthehash-tableone.However,thesetLdataMdata similartole-systemstoragestructures,suchasunix'sinodes.themonoidabstraction ofvalidmemorylocationsinthisstructureisnow Nowthedenitionofoperationisthesameasforthehash-tablestructure,see(2.1). Inthegeneralcaseofnestedarraysandtrees,themonoidabstractionisgeneratedby Ldata=(Zn)Z: wordconcatenationwithadditionalcommutationsbetweenvectorsofthesamedimension. Theresultiscalledafreepartiallycommutativemonoid[RS97b]: theunionofnodenamesintreesandintegervectors.itsbinaryoperationisdenedas Denition2.6(freepartiallycommutativemonoid)Afreepartiallycommutative generatorsofmarelettersinanalphabetaandallvectorsfromaniteunionof monoidmwithbinaryoperationisdenedasfollows: freecommutativemonoidsoftheformzn;
74 2.3.ABSTRACTMODEL false 4 true false true true true true structinode{ //falsemeansintermediatearrayofpointers booleanterminal //arraysize //truemeansterminalarrayofintegers intlength union{ inta[]; //arrayofinodepointers //arrayofblocknumbers }quad; } inode*n[];...figure2.5.aninodedeclaration... foragivenintegern,operationcoincideswithvectoradditiononzn,8x;y2zn: operationcoincideswithwordconcatenationona,8x;y2a:xy=xy; Thisframeworkclearlysupportsrecursivelynestedtreesandarrays. Inthefollowing,weabstractanydatastructureasasubsetLdataofthemonoidMdata xy=x+y. arrays.) withbinaryoperation.(denoteswordconcatenationfortreesandusualsumfor canbehandledbyourframework. deletionappearedintheprogram.thisruleisindeedtooconservative,andtwoexceptions Eventually,wehaverequiredintheprevioussectionthatnorun-timeinsertionor 1.Becauseitmakesnodierencefortheowofdatawhethertheinsertionisdonebeforetheprogramorduringexecution onlyassignmentofthevaluedoesmatters insertionsatalist'stailortree'sleafaresupported. 2.Theabstractionisstillcorrectwhendeletionsatalist'stailortree'sleafaresupported,butmayleadtooverlyconservativeresults.Indeed,supposeaninsertion
75 74 followsadeletionatthetailofalist.consideringwordsinthefreemonoidabstractionofthelist,thememorylocationofthetailnodebeforedeletionwillbealiased CHAPTER2.FRAMEWORK withthenewlocationoftheinsertedone. Thecaseofnestedloopswithscalarandarrayoperationsisveryimportant.Itappliesto awiderangeofnumerical,signal-processing,scientic,andmulti-mediacodes.alarge LoopNestsandArrays amountofworkhasbeendevotedtosuchprograms(orprogramfragments),andvery addressingschemeinarrays,usingintegersandintegervectors,becausez-moduleshave frameworkformemoryaddressingandinstancenaming.indeed,wepreferthenatural aboveeasilycapturessuchprograms,itseemsbotheasierandmorenaturaltouseanother powerfulanalysisandtransformationtechniqueshavebeencrafted.whiletheframework amuchricherstructurethanplaincommutativemonoids. commutativemonoids: denition,introducedbyparikh[par66]tostudypropertiesofalgebraicsubsetsoffree controlwordscanbeembeddedintovectors.thisembeddingisbasedonthefollowing Toensureconsistencyofthecontrolwordandintegervectorframeworks,weshowhow Denition2.7AParikhmappingoveralphabetctrlisafunctionfromwordsover Thereisnospecicorderinwhichlabelsaremappedtodimensions,butweareinterested ctrltointegervectorsinncard(ctrl),suchthateachwordwismappedtothevector inaparticularmappingwheredimensionsareorderedfromthelabeloftheouterloopto ofoccurrencecountofeverylabelinw. thelabeloftheinnerone. aretransitionsloopingonthesamestate.asaresult,thelanguageofcontrolwordsisin one-to-onemappingwithitssetofparikhvectors.thefollowingmappingiscomputed fortheloopnestinfigure2.6: Theloopneststructureisnon-recursive,hencetheonlycyclesinthecontrolautomaton AA(aA) BB(bB)s+CC(cC)r!N11 RespectiveParikhvectorsofinstancesAAaAaAaAaABBbBbBsandAAaAaACCcCcCcCrare w7! jwja;jwja;jwja;jwjb;jwjb;jwjb; (1;5;4;1;2;2;0;0;0;1;0)and(1;4;3;0;0;0;1;4;3;0;1). jwjc;jwjc;jwjc;jwjs;jwjr: B=B=b A=A=afor(i=0;i<100;i++){... sc=c=c r for(j=0;j<100;j++) } for(k=0;k<100;k++) A[i,j]= =A[i,k] statementsandcollapsingallloopsatthesamenestinglevelinthesamedimension.doing...figure2.6.computationofparikhvectors... FromParikhvectors,webuilditerationvectorsbyremovingalllabelsofnon-iteration
76 2.4.INSTANCEWISEANALYSIS this,thereisaone-to-onemappingbetweenparikhvectorsandpairsbuiltofiteration 75 vectorsandstatementlabels.indeed,thestatementlabelcapturesboththelastnon-zero componentoftheparikhvector i.e.theidentityofthestatement andtheidentityof thesurroundingloops i.e.whichdimensioncorrespondstowhichloop. dimension. labelsofiterationstatements andlabelsbandcarecollapsedtogetherintothesecond ContinuingtheexampleinFigure2.6,theonlyremaininglabelsarea,bandc i.e. IterationvectorofinstanceAAaAaACCcCcCcCrofstatementris(2;3). IterationvectorofinstanceAAaAaAaAaABBbBbBsofstatementsis(4;2). icographicorderoniterationvectors(therstdimensionshavingahigherprioritythan thelast). Inthisprocess,thelexicographicorder<lexoncontrolwordsisreplacedbythelex- framework. workfornaminginstancesinloopnests areaspecialcaseofourgeneralcontrolword Becauseastatementinstancecannotbereducedtoaniterationvector,weintroduce Asaconclusion,Parikhmappingsshowthatiterationvectors theclassicalframe- thefollowingnotations(thesenotationsgeneralizetheintuitiveonesattheendofsection2.1): hs;xistandsfortheinstanceofstatementswhoseiterationvectorisx; hs;x;refistandsfortheaccessbuiltfrominstancehs;xiandreferenceref. deeplyinthelinearalgebraicmodeltoberewrittenintermsofcontrolwords.further Inparticular,theymaystillbeusefulwhengotosandnon-recursivefunctioncallsare considered.however,mostinterestingloopnesttransformationtechniquesarerootedtoo Thisdoesnotimplythatcontrolwordsareacaseofoverkillwhenstudyingloopnests. comparisonislargelyopen,butsomeideasandresultsarepointedoutinsection4.7. previousdenition2.2ofaprogramexecutionisnotverypractical.forourpurpose, Becauseourexecutionmodelisbasedoncontrolwordsinsteadofexecutiontraces,the 2.4 InstancewiseAnalysis asequentialexecutione2eofaprogramisseenasapair(<seq;fe),where<seqis thesequentialorderoverallpossiblestatementinstances(associatedtothelanguageof isdeterministic.order<seqisthuspartial,butitsrestrictiontoasetofinstancesiefor controlwords)andfemapseveryaccesstothememorylocationiteitherreadsorwrites. agivenexecutione2eisatotalorder.however,feclearlydependsontheexecutione, possiblestatementinstancesforallexecutions,whichislegalbecausesequentialexecution Noticethat<seqisnotdependentontheexecution:itisdenedastheorderbetweenall ofeverystatementinstance,foragivenexecutionoftheprogram.itisafunctionfromthe CL99] itisalsocalledaccessfunction[cc98,fea98].storagemappinggatherstheeect anditsdomainisexactlythesetaeofaccesses. exactsetaeofaccesses(seedenition2.3)thatactuallyexecuteintothesetofmemory Functionfeisthestoragemappingforexecutioneoftheprogram[CFH95,Coh99b, locations.
77 76Inpractice,thesequentialexecutionorderisexplicitlydenedbytheprogramsyntax, CHAPTER2.FRAMEWORK tocomputefe(a)forallexecutionseandaccessesa,ortocomputeapproximationsoffe. butitisnotthecaseofthestoragemapping.someanalysishastobeperformed,either referredas\program(<seq;fe)"inthefollowing. butitcanalsobeseenasafunctionmappinge2etopairs(<seq;fe).forthesakeof simplicity,suchafunction whichdenesallpossibleexecutionsofaprogram willbe Eventually,(<seq;fe)hasbeendenedasaviewofaspecicprogramexecutione, Manyanalysisandtransformationtechniquesrequiresomeinformationon\conicts" betweenmemoryaccesses. ConictingAccessesandDependences Denition2.8(conict)Twoaccessesaanda0areinconictiftheyaccess either [TD95].Analysisofconictingaccessesisalsoverysimilartoaliasanalysis[Deu94, readorwrite thesamememorylocation:fe(a)=fe(a0). CBC93].Theconictrelationistherelationbetweenconictingaccesses,andisdenoted byeforagivenexecutione2e.anexactknowledgeoffeandeisimpossiblein Thisvocabularyisinheritedfromthecacheanalysisframeworkanditsconictmisses thereisanexecutionesuchthatv;w2aeandfe(v)=fe(w),i.e. conictrelation,compatiblewithanyexecutionoftheprogram:vwmustholdwhen analysisofconictingaccessesconsistsinbuildingaconservativeapproximationofthe general,sincefemaydependontheinitialstateofmemoryand/orinputdata.thus, Thisconditionistheonlyrequirementonrelation,butapreciseapproximationis generallyhopedfor.formostprogramanalysispurposes,thisrelationonlyneedsto 8e2E;8v;w2Ae: fe(v)=fe(w)=)vw: (2.5) becomputedonwrites,orbetweenreadsandwrites,butotherproblemssuchascache compile-time,thesetofstatementinstancesiecanbeeitherstatementsorstatement functionsonwhichnoinformationisavailable.becausethesignofvisunknownat analysis[td95]requireafullcomputation. T(statementscoincideswithstatementinstancessincetheyarenotsurroundedbyany ConsidertheexampleinFigure2.7whereFirstIndexandSecondIndexareexternal andthentheymayalsoyieldconictingaccesses,i.e. compile-time.theonlyavailablecompile-timeinformationisthatsandtmayexecute, looporprocedurecall),dependingontheexecution.sincetheresultsoffirstindex andsecondindexareunpredictabletoo,noexactstoragemappingcanbecomputedat theifthenelseconstructsyntax),andthensandtcannotbeconicting However,anotherinformationisthatexecutionsofSandTaremutuallyexclusive(dueto hs;a[firstindex()]iht;a[secondindex()]i: accesses: ertiessuchasconictingaccesses,anditalsoshowshowcomplexitistoachieveprecise S2Ae^T2Ae: results.
78 2.4.INSTANCEWISEANALYSIS scanf("%d",&v); intv,a[10]; TS else if(v>0) A[SecondIndex()]= A[FirstIndex()]=...Figure2.7.Execution-dependentstoragemappings... Denition2.9(dependence)Anaccessadependsonanotheraccessa0ifatleastone toexecuteinanyorder.suchconditionscanbeexpressedintermsofdependences: Forthepurposeofparallelization,weneedsucientconditionstoallowtwoaccesses isawrite(i.e.a2weora02we),iftheyareinconict i.e.fe(a)=fe(a0) andif a0ea: a0executesbeforea i.e.a0<seqa. 8e2E;8a;a02Ae: Thedependencerelationforanexecutioneisdenotedbye:adependsona0iswritten Onceagain,anexactknowledgeofeisimpossibleingeneral.Thus,dependenceanalysis a0eadef ()(a2we_a02we)^a0<seqa^fe(a)=fe(a0): consistsinbuildingaconservativeapproximation,i.e. (2.6) 8e2E;8a;a02Ae: a0ea=)a0a: e.g.inparallel iftheyarenotdependent. Eventually,Bernstein'sconditionstellthattwoaccessescanbeexecutedinanyorder (2.7) givenareadaccessinmemory,theyneedtoidentifythestatementinstancethatproduced Sometechniquesrequiremoreprecisionthanisavailablethroughdependenceanalysis: ReachingDenitionAnalysis thevalue.thenthereadaccessiscalledtheuseandtheinstancethatproducedthevalue denitionisindeedthelastinstance accordingtotheexecutionorder onwhichtheuse depends. iscalledthe\denition"that\reaches"theuse,orreachingdenition.thereaching Wethusdenefunctione,mappingeveryreadaccesstoitsreachingdenition: or,replacingmaxwithitsdenition: 8e2E;8u2Re:e(u)=max <seqv2we:veu ; (2.8) 8e2E;8u2Re;v2We: v=e(u)def veu^ 8w2We:u<seqw_w<seqv_:(wu): ()
79 or,replacingewithitsdenition(2.6): 78 CHAPTER2.FRAMEWORK Sodenitionvreachesuseuifitexecutesbeforetheuse,ifbothrefertothesamememory 8e2E;8u2Re;v2We: v<sequ^ 8w2We:u<seqw_w<seqv_fe(v)6=fe(w): v=e(u)def () location,andifnointerveningwritewkillsthedenition. largerprogram.tocopewiththisproblem,weaddavirtualstatementinstance?which executesbeforeallinstancesintheprogramandassignseverymemorylocation.then, value(hintingataprogrammingerror)ortheanalyzedprogramisonlyapartofa Whenareadinstanceuhasnoreachingdenition,eitherureadsanuninitialized eachreadinstanceuhasauniquereachingdenition,whichmaybe?. analysiscomputesaconservativeapproximation.itispreferablyseenasarelation,i.e. Becausenoexactknowledgeofecanbehopedforingeneral,reachingdenition ofpossiblereachingdenitions.onemustbeverycarefulinthedistinctionbetweena Onemayalsouseasafunctionfromreadstosetsofwrites,andwetalkaboutsets 8e2E;8u2Re;v2We: v=e(u)=)vu: (2.9) reachingdenitionsisthekeytoprogramcheckingtechniques,sinceitmaycorrespond producedbeforeexecutingtheprogram.thefactthat?appearsinasetofpossible setofeectiveinstancessieandthesets[f?g:if?62(u)thenitsaysthatu touninitializedvalues. readsavalueproducedbysomeinstanceins,butif?2(u)thenumayreadavalue presentedin[cbf95].theprogrammodelisrestrictedtoloopnestswithunrestricted Thissectionisanoverviewoffuzzyarraydataowanalysis(FADA);whichwasrst AnExampleofInstancewiseReachingDenitionAnalysis conditionals,loopboundsandarraysubscripts.theaimofthisshortpresentationis toallowcomparisonwithourownanalysisforrecursiveprograms,andbecausetheresultsofaninstancewisereachingdenitionanalysisforloopnestsareextensivelyusedin IntuitiveFlavor Chapter5. Accordingto(2.8),theexactreachingdenitionofsomereadaccessu e(u) isdened non-linearbounds,wehavetocopewithaconservativeapproximationofthedependence asthemaximumofthesetofwritesine(u)(foragivenprogramexecutione2e). Assoonastheprogrammodelincludesconditionals,whileloops,anddoloopswith arraysubscripts. aneconstraintsin(2.6)areapproximatedusingadditionalanalysesonvariablesand relation.inthecaseofnestedloops,oneusuallylookforananerelation,andnon- approximatesetofdependenceshasnomeaning:theveryexecutionofinstancesin(u) isnotguaranteed.onesolutionistotaketheentireset(u)asanapproximationofthe reachingdenition.canwedobetterthanthat?letusconsideranexample.noticerst Butthen,andwiththeexceptionofveryspecialcases,computingthemaximumofan that,forexpositoryreasons,onlyscalarsareconsidered.themethod,however,applies toarrayswithanysubscript. for(i=0;i<n;i++){
80 2.4.INSTANCEWISEANALYSIS if() 79 S1 S2 else R=x; } x=; AssumingthatN1,whatisthereachingdenitionofreferencexinstatementR? SinceallinstancesofS1andS2areindependencewithhRi,itseemslikewecannotdo betterthatapproximating(hri)withfhs1;1i;:::;hs1;ni;hs2;1i;:::;hs1;nig. testatiterationi,foraprogramexecutione2e.thisallowstocomputetheexact dependencerelationeatcompile-time: Letusintroduceanewbooleanfunctionbe(i)whichrepresentstheoutcomeofthe whichcanalsobewritten 8e2E;8v2We: vehri()9i2f1;:::;ng:(v=hs1;ii^be(i))_(v=hs2;ii^:be(i)); themaximumofe(hri). Sincetheaboveresultisnotapproximate,theexactreachingdenitione(hRi)ofhRiis 8e2E:e(hRi)=fhS1;ii:1iN^be(i)g[fhS2;ii:1iN^:be(i)g: hs2;iiwithi<nisoverwritteneitherbyhs1;niorbyhs2;ni.thisprovesthat1emust beequalton.conversely,supposinge(hri)isaninstancehs2;2ei,thesamereasoning :be(i)isequaltotrueforalli2f1;:::;ng,anyvalueproducedbyaninstancehs1;iior Supposee(hRi)isaninstancehS1;1eiforsomeexecutione2E.Becausebe(i)_ provesthat2emustbeequalton.then,wehavethefollowingresultforfunctione: Wemaynowreplacebeand:bebytheirconservativeapproximations: 8e2E: e(hri)=fhs1;ni:be(n)g[fhs2;ni::be(n)g: (2.10) Noticeherethehighprecisionachieved. Tosummarizetheseobservations,ourmethodwillbetogivenewnamestotheresultof (hri)=fhs1;ni;hs2;nig: (2.11) maximacalculationsinthepresenceofnon-linearterms.thesenamesarecalledparametersandarenotarbitrary:asshownintheexample,somepropertiesontheseparameters increasetheaccuracyofthereachingdenition.insomecases,theserelationsmaybeso techniques.theserelationsimplyrelationsontheparameters,whicharethenusedto byasimpleexaminationofthesyntacticstructureoftheprogramorbymoresophisticated canbederived.moregenerally,onecanndrelationsonnon-linearconstraints likebe preciseastoreducethe\fuzzy"reachingdenitiontoasingleton,thusgivinganexact result.see[bcf97,bar98]foraformaldenitionandhandlingoftheseparameters. eithersetsofinstanceswhoseiterationvectorcomponentsareagainquasi-ane,or?. forthepositivenessofquasi-aneforms(whichincludeintegerdivision),andleavesare denitionrelationisaquast,i.e.anestedconditionalinwhichpredicatesaretests ThegeneralresultcomputedbyFADAisthefollowing:theinstancewisereaching SeeSection3.1fordetailsaboutquasts.
81 ImprovingAccuracy 80 CHAPTER2.FRAMEWORK inthepreviousexample,thesepropertiesimplypropertiesontheparametersintroduced inourcomputation. Toimprovetheaccuracyofouranalysis,propertiesonnon-aneconstraintsinvolvedin thedescriptionofthedependencescanbeintegratedinthedata-owanalysis.asshown tion.however,therelationstheyndcanbewrittenasrstorderformulasofadditive verydierentformalismsandalgorithms,frompattern-matchingtoabstractinterpretagramoronnon-anefunctions(see[ch78,mas93,mp94,tp95]forinstance).theyuse Severaltechniqueshavebeenproposedtondpropertiesonthevariablesofthepro- arithmetic(a.k.a.presburgerarithmetics,seesection3.1)onthevariablesandnon-ane algorithmindependentofthepracticaltechniqueinvolvedtondproperties. functionsoftheprogram.thisgeneraltypeofpropertymakesthedata-owanalysis setofpossiblereachingdenitions[bar98].thisisduetodecidabilityreasons;butfor (fullyorpartially)theseproperties.ingeneral,theanalysiscannotndthesmallest Thequalityoftheapproximationisdenedw.r.t.theabilityoftheanalysistointegrate Howthepropertiesaretakenintoaccountintheanalysisisdetailedin[BCF97,Bar98]. approximationcanbefound. somekindofproperties,suchaspropertiesimpliedbytheprogramstructure,thebest Untilthen,everysetofinstancesoraccessesconsideredwasexactanddependentonthe execution.however,ashintedbefore,wewillmostlyconsiderapproximativesetsand MoreAboutApproximations relationsinthefollowing.forthisreason,weneedthefollowingconservativeapproximations: I,thesetofallpossiblestatementinstancesforeverypossibleexecutionofagiven A,thesetofallpossibleaccesses, program, 8e2E: {2Ie=){2I; R,thesetofallpossiblereads, 8e2E: a2ae=)a2a; W,thesetofallpossiblewrites, 8e2E: a2re=)a2r; Theycanbeveryconservativeorbetheresultofaverypreciseanalysis.Inpractice,the 8e2E: a2we=)a2w: precisionofthesesetsisnotcriticalbecausetheyarerarelydirectlyusedinalgorithms instancesandaccesses,whichhavetheirowndedicatedanalysisandapproximation. (buttheyarewidelyusedintheoreticalframeworksassociatedwiththesealgorithms). Mostofthetime,theyareimplicitlypresentasdomainsorimagesofeveryrelationover formationtechniques.inourframework,nootherinstancewiseinformationisavailable itmeansoptimalityaccordingtothisinformation:nobodycandoabetterjobifhisonly atcompile-time.inparticular,whenwepresentanoptimalityresultforsomealgorithm SetsI,A,R,Wandrelations,6,,arethekeytoprogramanalysisandtrans- informationsarethesetsandrelationsabove.
82 2.5.PARALLELIZATION Parallelization 81 preservethesequentialprogramsemantics.e WiththemodeldenedinSection2.4,parallelizationofsomeprogram(<seq;fe)means constructionofaprogram(<par;fexp orderandasub-orderof<seq.buildinganewstoragemappingfexp memoryexpansion.3obviously,<parandfexp e),where<parisaparallelexecutionorder:apartial Someadditionalpropertiesthatarenotmandatoryfortheexpansioncorrectness,are mustsatisfyseveralpropertiesinorderto e fromfeiscalled guaranteedbymostpracticalexpansiontechniques.forexample,thepropertythatthey eectively\expand"datastructures.intuitively,astoragemappingfexp Denition2.10(ner)Foragivenexecutioneofaprogram,astoragemappingfexp whenitusesatleastasmuchmemoryasfe.moreprecisely: e isnerthanfe isnerthanfeif 8v;w2W: e(v)=fexp e(w)=)fe(v)=fe(w): e Somebasicexpansiontechniquestechniquestobuildastoragemappingfexp listedinsection1.2,theyareusedimplicitlyorexplicitlyinmostmemoryexpansion MemoryExpansionandParallelismExtraction algorithms,suchastheonespresentedinchapter5. Now,thebenetofmemoryexpansionistoremovespuriousdependencesduetomemoryreuse:\themoreexpansion,thelessmemoryreuse".Then,removingdependences withsequentialexecutionorder(<seq;fexp sidertheexactdependencerelationexp extractsmoreparallelism:\thelessmemoryreuse,themoreparallelism".indeed,con- e havebeen 8e2E;8a;a02Ae: eforthesameexecutionoftheexpandedprogram a0exp eadef ()(a2we_a02we)^a0<seqa^fexp e): (overaccesses): Anyparallelorder<par(overinstances)mustbeconsistentwithdependencerelationexp e(a)=fexp e(a0):(2.12) 8e2E;8({1;r1);({2;r2)2Ae: ({1;r1)exp e({2;r2)=){1<par{2 e mationexpofexp ({1,{2areinstancesandr1,r2arereferencesinastatement). itscomputationisinducedbytheexpansionstrategy,seesection5.4.8forexample. Ofcourse,wewantacompile-timedescriptionandconsideraconservativeapproxi- Theorem2.2(correctnesscriterionofparallelexecutionorders)Giventhefollowingcondition,theparallelorderiscorrectfortheexpandedprogram(itpreserves theoriginalprogramsemantics). e.thisapproximationdoesnotrequireanyspecicanalysisingeneral: vertedtosingle-assignmentform(butnotssa):everydependenceduetomemoryreuse Animportantremarkisthatexp 8({1;r1);({2;r2)2A:({1;r1)exp({2;r2)=){1<par{2: e isactuallyequaltoewhentheprogramiscon- (2.13) isremoved.wemaythusconsiderexp=toparallelizesuchcodes. 3Becausemostofthetime,fexp erequiresmorememorythanfe.
83 ComputationofaParallelExecutionOrder CHAPTER2.FRAMEWORK <par,dataparallelism thesecondparadigm willbeassumed. twomainparadigmstogenerateparallelcode.tocomputetheparallelexecutionorder programswillbeaddressedinsection5.5.wehavealreadypresented insection1.2 Inthissection,werecallsomeclassicalresultsaboutloopnestparallelization;recursive Insteadofpresentinganovelalgorithmforparallelization,weshowhowmostofthe byseveralauthors:[col95a,col94b,gc95]tociteonlytheresultsnearesttoourwork. existingonescanbeintegratedinourframework. Extendingparallelizationtechniquestoirregularloopnestshasalreadybeenstudied Scheduling Dependenceorreachingdenitionanalysesderiveagraphwherenodesareoperationsand edgesareconstraintsontheexecutionorder.theproblemisnowtotraversethegraphin growwithproblemsize,i.e.aclosedform.additionalconstraintsontheexpressionof asthelistofrelationpairs:oneneedsanexpressionofthepartialorderthatdoesnot theorder,thehighertheparallelism.ingeneral,thispartialordercannotbeexpressed apartialorder;thisorderistheexecutionorderfortheparallelprogram.themorepartial partialordersare:haveahighexpressivepower;beeasilyfoundandmanipulated;allow optimizedcodegeneration. schedules.thisissueisstudiedbyfeautrierin[fea92].thefollowingdenitionscon- instancestothesetnofpositiveintegers.inamoregeneralpresentationofschedules, vectorsofintegerscanbeused:onemaythentalkaboutmultidimensional\time"and Asuitablesolutionistouseaschedule[Fea92],i.e.afunctionfromthesetIofall resultforaschedulefunction,thecorrectnessbecomes ordersaredenedfromthedependencerelationintheexpandedprogram.rewritingthis siderone-dimensionalschedulesonly,butitmakesnofundamentaldierencewithmulti- dimensionalones.fromtheorem2.2,wealreadyknowhowthecorrectparallelexecution whereexpisthedependencerelationintheexpandedprogram.(formultidimensional schedules,<lexisusedtocomparevectors).ifnoexpansionhasbeenperformedexpis 8({1;r1);({2;r2)2A:({1;r1)exp({2;r2)=)({1)<({2); (2.14) theoriginaldependencerelation.iftheprogramhasbeenconvertedtosingleassignment form,itisthereachingdenitionrelation.ontheotherhand,sinceisintegervalued, theconstraintaboveisequivalentto: unknownfunction.asitisoftentrueforsystemofinequalities,itmayhavemany Thissystemoffunctionalinequalities,calledcausalityconstraints,mustbesolvedforthe 8({1;r1);({2;r2)2A:({1;r1)exp({2;r2)=)({1)+1({2): (2.15) dierentsolutions.onecanminimizevariousobjectivefunctions,ase.g.thenumberof synchronizationpointsorthelatency. Feautrier'sSchedulingAlgorithm concatenatingiter({1),iter({2),andthevectorofsymbolicconstantsintheproblem Inthefollowing,notationIter({)denotestheiterationvectorofinstance{.Considering (2.15),letusintroduce,thevectorofallvariablesintheproblem:isobtainedby
84 2.5.PARALLELIZATION (recalliter(hs;xi)=x).itsohappensthat,inthecontextofanedependencerelations, 83 (({1;r1)exp({2;r2))isthedisjunctionofconjunctionsofaneinequalities.Inotherwords, relations,isalsotruewhenthedependencerelationisapproximatedinvariouswayssuch thesetf(u;v):uexpvgisaunionofconvexpolyhedra.thisresult,builtforgeneralane Ci()0,1iM.Similarly,let asdependencecones,directionvectorsanddependencelevels,see[pd96,ban92,dv97]. Sincetheconstraintsintheantecedentof(2.15)areane;letusdenotethemby Lemma2.2(AneFormofFarkas'Lemma)Ananefunction (2.15).Then,wecanapplythefollowinglemma: ()0betheconsequent(v) (u) 10in existsnon-negativeintegers0;:::;m(thefarkasmultipliers)suchthat: vectorstointegersisnon-negativeonapolyhedronf:ci()0;1imgifthere ()frominteger Thisrelationisvalidforallvaluesof.Hence,onecanequatetheconstanttermandthe ()=0+MXi=1iCi() (2.16) [Fea88b,Pug92](seealsoSection3.1). coecientofeachvariableineachsideoftheidentity,togetasetoflinearequationswhere latterareconstrainedtobepositive,thesystemmustbesolvedbylinearprogramming theunknownsarethecoecientsoftheschedulesandthefarkasmultipliers,i.sincethe itisclearthatsomeloopnestshavefeworevennoparallelism,hencenoaneschedule. thatwhenaloopnesthasananeschedule,ithasalargedegreeofparallelism.however, Thesolutioninthiscaseistouseamultidimensionalaneschedule,whosedomainisNd, Unfortunately,someloopnestsdonothave\simple"aneschedules.Thereasonis Itcanbeprovedthatanyloopnestinanimperativeprogramhasamultidimensional d>1,orderedaccordingtothelexicographicorder.suchaschedulecanhaveaslowa schedule.noticethatmultidimensionalschedulesareparticularlyusefulinthecaseof tionofamultidimensionalschedulecanbeautomatedbyusingalgorithmsfrom[fea92]. degreeofparallelismasnecessary,andcanevenrepresentsequentialprograms.theselec- andhencetounderestimatethedegreeofparallelism. dynamiccontrolprograms,sincewehaveinthatcasetooverestimatethedependences plexinpractice:issuessuchaspolyhedron-scanning[ai91],communicationhandling, [PD96](pages79{103).Dealingwithcomplexloopboundsandconditionalsraisesnew taskplacement,andlow-leveloptimizationsarecriticalforecientcodegeneration Codegenerationofparallelscheduledprogramsissimpleintheory,butverycom- codegenerationproblems{nottalkingaboutallocationofexpandeddatastructures see OtherSchedulingTechniques [GC95,Col94a,Col95b]. loopfusion,loopinterchange,loopreversal,loopskewing,loopscaling,loopreindexing andstatementreordering.moreover,dependencesabstractionsweremuchlessexpressive rithmswerebasedonclassicallooptransformationtechniquesthatincludeloopssion, BeforethegeneralsolutiontotheschedulingproblemproposedbyFeautrier,mostalgocoveredbyDarteandVivien[DV97].Extendingpreviousresults,theydesignedavery othersolutions[ban92].severalcomplexityandoptimalityresultshavealsobeendis- thananerelations. TherstalgorithmwasdesignedbyAllenandKennedy[AK87],whichinspiredmany
85 powerfulalgorithm,butitsabstractiondoesnotsupportthefullexpressivepowerofane 84 CHAPTER2.FRAMEWORK relations. schedule,andtheycomparetheirtechniquewithotherrecentimprovements. proposein[ll97]atechniquetoreducethenumberofsynchronizationsinducedbya becauseofthewiderangeofobjectivefunctionstooptimize.forexample,limandlam Moreover,manyoptimizationsofFeautrier'salgorithmhavebeendesigned,mainly dencegraphs,butitisnotforgeneralanerelations.ithasbeenexploredbycollardand Feautrierasawaytoextractmoreparallelismfromprogramswithcomplexloopbounds andconditionals[col95a,col94b]. Speculativeexecutionisaclassicaltechniquetoimproveschedulingofnitedepen- anefunctionsofiterationvectors.theassociatedparallelexecutionorderisthusan anerelation<par,wellsuitedtoourformalframework: Eventually,allschedulefunctionscomputedbythesetechniquescanbecapturedby forone-dimensionalschedules,and 8u;v2W:u<parv()(u)<(v) formultidimensionalones. 8u;v2W:u<parv()(u)<lex(v) leadtoverybadperformance,mainlybecauseofcommunicationoverheadandcache problems.indeed,negrainparallelizationisnotsuitabletomostparallelarchitectures.4 Tiling Partitioningrun-timeinstancesisthusanimportantissue:thesolutionistogroupelementarycomputationsinordertotakeadvantageofmemoryhierarchiesandtooverlap Despitethegoodtheoreticalresultsandrecentachievements,schedulingtechniquescan communicationsandcomputations. computationpatterns[it88,cfh95,bdrr94].animportantgoaloftheseresearches istondthebesttilingstrategyrespectingmeasurecriterialikethenumberofcommunicationshappeningbetweenthetiles.thisstrategymustbeknownatcompiletimetecutedonaprocessorinanatomicway.itiswellsuitedtonestedloopswithregular Thetilingtechniquegroupselementarycomputationsintoatile,eachtilebeingex- supposeduniformwhenevaluatingtheamountofcommunications.themostusualtile modelhasbeendenedbyirigoinandtrioletin[it88];itenforcesthefollowingconstraints: Mosttilingtechniquesarelimitedtoperfectloopnests,anddependencesareoften generateecientcodeforaparticularmachine. tilesareidenticalbytranslationtoallowecientcodegenerationandautomatic tilesareboundedforlocalmemoryrequirements; tilesareatomicunitsofcomputationwithsynchronizationstepsattheirbeginning processing; 4Butitissuitableforinstruction-levelparallelism. andattheirend.
86 2.5.PARALLELIZATION Manydierentalgorithmshavebeendesignedtondanecienttileshapeandthento 85 techniques,dependingonthecontext.thesimplestinner-tileexecutionorderistheoriginalsequentialexecutionofelementarycomputations,butotherexecutionorders stilingalgorithms.however,inner-tilesequentialexecutionisopenforalargerscopeof partitionthenestofloops.schedulingofindividualtilesisdoneusingclassicalschedul- hierarchy,orwouldenablemoreaggressivestoragemappingoptimizationtechniques(see compatiblewiththeprogramdependences couldbemoresuitableforthelocalmemory extensivepresentationoftilingcanbefoundin[bdrr94]. Section5.3fordetails,butfurtherstudyofthisideaisleftforfuturework).Amore executionorders.thetileshapecanbeanyboundedparallelepiped(orpartofaparallelepipedoniterationspaceboundaries),butisoftenarectangleinpractice.then,the instancestoindividualtilesandtheschedulemapstilestointegersorvectorsofintegers. orderframework,withananerelation<par: Eventually,theresultofatilingtechniquecanbecapturedbyourparallelexecution <inn.nevertheless,wearenotawareoftechniquesthatwouldnotbuildaneinner-tile niquesinoutframework:theinner-tileexecutionordermustbeane.itisdenotedby Wemakeonehypothesistohandleparallelexecutionordersproducedbytilingtech- resultofatilingtechniqueisapair(t;),wherethetilingfunctiontmapsstatement foraone-dimensionalscheduleoftiles,and 8u;v2W:u<parv()(T(u))<(T(v))_(T(u)=T(v)^u<innv) (2.17) foramultidimensionalschedule. 8u;v2W:u<parv()(T(u))<lex(T(v))_(T(u)=T(v)^u<innv)(2.18) Whendealingwithnestofloops,itiswellknownthatcomplexlooptransformations requirecomplexpolytopetraversals,whichslightlyincreasesexecutiontime.moreover, GeneralEciencyRemarks statementsoftengrowhugebecauseofnestedconditionalexpressions.then,thecode generatedbyastraightforwardapplicationofparallelizationalgorithmsisveryinecient. evenwhennorun-timerestorationofthedataowisrequired,theright-handsideof niques[ai91,fb98]. Movingconditionalsandsplittingloopsisveryuseful,aswellaspolytopescanningtech- eitherlimitedtonon-recursiveprogramsormuchlesseectivewithcomplexrecursive forwardsubstitution,invariantcodemotion,dead-codeelimination[asu86,muc97] are Theonlydierenceisthatmostoptimizationtechniques suchasconstantpropagation, Theseremarksnaturallyextendtorecursiveprogramsandrecursivedatastructures. techniquessuitableforrecursiveprograms. requiredmanualoptimizations.thisshouldencourageustodevelopmoreaggressive structures.inthiswork,indeed,mostexperimentationswithrecursiveprogramshave forbidanyfurtherpreciseanalysisoraggressiveprogramtransformation,especiallywhen usinggenerictypes(suchasvoid*). pointer-baseddatastructuresareconsidered.asinglepairofaliasedpointersislikelyto Ofcourse,shapeandaliasanalysesdiscussedinSection2.2.2areveryusefulwhen [HP96]arecriticalforprogramanalysisandtransformation.Itisespeciallytruefor Inductionvariabledetection[Wol92]andotherrelatedsymbolicanalysistechniques
87 instancewiseanalyses:computingthevalueofaninteger(orpointer)variableateach 86 CHAPTER2.FRAMEWORK instanceofastatementisthekeyinformationfordependenceanalysis.wewillindeed presentanewinductionvariabledetectiontechniquesuitableforourrecursiveprogram model. notaddressthesenecessarypreviousstagesandoptimizations: wewillalwaysconsiderthattherequiredinformationaboutdatastructureshape, Inthefollowing,whennospeciccontributionhasbeenproposedinthiswork,wewill wewillgenerateunoptimizedtransformedprograms,supposingthatclassicaloptimizationtechniquescandothejob. classicaltechniques; aliasesorinductionvariablesisavailable,whenthisinformationcanbederivedby Wemakethehypothesisthatourtechniques,ifimplementedinaparallelizingcompiler, areprecededandfollowedbytheappropriateanalysesandoptimizations.
88 87 Chapter3 FormalTools addressesrationalrelationsovermonoids.contributionstoaninterestingclassofrationalrelationsarefoundinsection3.4.section3.5addressesalgebraicrelations,andalso presentssomenewresults.thetwolastsectionsaremostlydevotedtoapplicabilityof aneinequalities.section3.2recallsclassicalresultsonformallanguagesandsection3.3 tion3.1isageneralpresentationofpresburgerarithmeticsandalgorithmsforsystemsof Mosttechnicalresultsonmathematicalabstractionsaregatheredinthischapter.Seccussesintersectionofrationalandalgebraicrelations,andapproximationofrelationsis thepurposeofsection3.7. formallanguagetheorytoouranalysisandtransformationframework:section3.6dis- \ondemand"whentechnicalinformationisrequiredinthefollowingchapters. mayskipallproofsandtechnicallemmas,toconcentrateonthemaintheorems.because thischapterismorea\referencemanual"formathematicalobjects,itcanalsobeenread Thereaderwhoseprimaryinterestisintheanalysisandtransformationtechniques 3.1 relationsandfunctions.thisabstractionmustalsosupportclassicalalgebraicoperations. Whendealingwithiterationvectors,weneedamathematicalabstractiontocapturesets, PresburgerArithmetics 8.TestingthesatisabilityofaPresburgerformulaisatthecoreofmostsymbolic Presburgerarithmeticsiswellsuitedtothispurpose,sincemostinterestingquestionsare computationsinvolvinganeconstraints.itisknownasintegerlinearprogrammingand equalityandinequalityofintegeraneconstraints,andrstorderquantiers9and decidablewithinthistheory.itisdenedbylogicalformulasbuildfrom:,_and^, FeautrierinPIP[Fea88b,Fea91].Inpractice,Fourier-Motzkinisveryecientonsmall super-exponentialintheworstcase,suchasthefourier-motzkinalgorithmimplemented bypughinomega[pug92]andthesimplexalgorithmwithgomorycutsimplementedby isdecidable,butnp-complete,see[sch86]fordetails.indeed,allknownalgorithmsare complexityispolynomialinthemean.computingexactsolutionstolargeintegerlinear ofpresburgerarithmeticstoautomaticparallelization. programsisanopenproblematpresent,andthisisaproblemforpracticalapplication problems,andthesimplexalgorithmismoreecientonmediumproblems,becauseits
89 Sets,RelationsandFunctions CHAPTER3.FORMALTOOLS relationonsetsaandbcanequivalentlybedescribedbyafunctionfromatotheset P(B)ofsubsetsofB.Noticetherangeanddomainofafunctionorrelationmaynot areseenasaspecialcaseofrelationandrelationsarealsointerpretedasfunctions:a Weconsidervectorsofintegers,andsets,functions,andrelationsthereof.Functions Presburgerformulasonintegervectorsextendedwith?. <lex,andthe\bottomelement"?denotesbydenitionanelementwhichprecedesall havethesamedimension.setsofintegervectorsareorderedbythelexicographicorder integervectors.strictlyspeaking,weconsidersets,functionsandrelationsdescribedby variablesappearininput,outputorsettuples,whereasparametersarefullyunboundand logicalformulas,whereasunknownvariablesandparametersarefreevariables.unknown variables:bound,unknownsandparameters.boundvariablesarequantiedby9and8in TodescribemathematicalobjectsinPresburgerarithmetics,weusethreetypesof interpretedassymbolicconstants.handlingparametersistrivialwithfourier-motzkin, Programming(PIP)byFeautrier[Fea88b]. butrequiredaspecicextensionofthesimplexalgorithm,calledparametricinteger resentationforanerelationscalledquasi-aneselectiontreeorquast,wherequasi-ane detailswillbeexplainedwhenneededintheexperimentalsections.pipusesanotherrep- experiments,anditssyntaxisveryclosetotheusualmathematicalone.non-intuitive Omega[Pug92]iswidelyusedinourprototypeimplementationsandsemi-automatic Denition3.1(quast)Aquasi-aneselectiontree(quast)representingananerelation1isamanylevelconditional,inwhich withintegerconstants. formsareanextensionofaneformsincludingintegerdivisionandmodulooperations predicatesaretestsforthepositivenessofquasi-aneformsintheinputvariables andleavesaresetsofvectorsdescribedinpresburgerarithmeticsextendedwith? andparameters, quastscalledwildcardvariables.thesewildcardvariablesarenotfree:theyareconstrainedinsidethequastitself.moreover,quasi-aneforms(withmoduloanddivision operations)inconditionalsandleavescanbeconvertedinto\pure"aneformsthanks toadditionalwildcardvariables,see[fea91]fordetails. vectorsthatarenotinthedomainofarelation.letusgiveafewexamples. Thefunctioncorrespondingtointegeradditioniswritten Emptysetsareallowedinleaves theydierfromthesingletonf?g todescribe Itshouldbenoticedthatboundvariablesinanerelationsappearasparametersin whichprecedesanyothervectorforthelexicographicorder. andcanberepresentedbythequastfi1+i2g f(i1;i2)!(j):i1+i2=jg onlyanefunctions,see[gc95]. 1Infact,thisisanextensionofFeautrier'sdenitiontocaptureunrestrictedanerelationsandnot
90 3.1.PRESBURGERARITHMETICS ThesamefunctionrestrictedtointegerslessthanasymbolicconstantNiswritten 89 andasaquast f(i1;i2)!(j):i1<n^i2<n^i1+i2=jg ifi then 1<N else?else? ifi thenfi1+i2g 2<N Therelationbetweenevennumbersiswritten (wekeepthefunctionalnotation!forbetterunderstanding,andtobecompliant withomega'ssyntax)andaquastrepresentation f(i)!(j):(9;:i=2^j=2)g (isawildcardvariable) ifi=2 thenf2:2zg else? ManyotherexamplesofquastsoccurinChapter5. handlingofthesequasts.implementationwasdonebybouletandbarthou,see[bar98] one,butitisveryusefultocodegenerationalgorithmsandverynearfromtheparametric fordetails.thequastrepresentationisneitherbetternorworsethantheclassicallogical AnewinterfacetoPIPhasbeenwritteninObjectiveCaml,allowingeasyandecient supposethatmake-quastisanalgorithmtocomputeaquastrepresentationforany anerelation.(thereverseproblemismucheasierandnotusefultoourframework.)its integerprogrammingalgorithm. extensivedescriptionisrathertechnicalbutwemaysketchtheprinciplesofthealgorithm. Toconcludethispresentationofmathematicalabstractionsforanerelations,we ThePresburgerformuladeningtheanerelationisrstconvertedtoaformwithonly existentialquantiers,bythewayofnegationoperators(atechniquealsousedinthe buildsetsofintegervectors;andeventuallythe^and_operatorsarerewritteninterms Skolemtransformationofrstorderformulas);theneveryboundvariableisreplacedbya computationsarenotdiscussedhere,see[fea88b,pd96,bar98]fordetails. newwildcardvariable;unknownvariablesareisolatedfromequalitiesandinequalitiesto ofconditionalexpressions.subsequentsimplications,sizereductionsandcanonicalform see[sch86,pd96,pug92,fea88b]. sentationsofanerelations,specicalgorithmsandapplicationstocompilertechnology, FormoredetailsonPresburgerarithmetics,integerprogramming,mathematicalrepre- Computingthetransitiveclosureofarelationisaclassicaltechniqueincomputerscience, butmostalgorithmstargetrelationswhosegraphisnite.thishypothesisisobviously TransitiveClosure
91 notacceptableinthecaseofanerelations.theproblemisthatthetransitiveclosureof 90 CHAPTER3.FORMALTOOLS ananerelationmaynotbeananerelation;andknowingwhenitisananerelation isnotevendecidable.indeed,wecanencodethemultiplicationusingtransitiveclosure, whichisnotdenableinsidepresburgerarithmetics: itisequivalenttorr Rbeingempty. ItshouldbenotedthattestingifarelationRisclosedbytransitivityisverysimple: f(x;y)!(x+1;y+z)g=f(x;y)!(x0;y+z(x0 x)):xx0g: rathereasyintheory:thetransitiveclosurerofarelationrcanbedenedas Wearethusleftwithapproximationtechniques.Indeed,ndingalowerboundis andcomputingsnk=0rkforincreasingvaluesofnyieldsincreasinglyaccuratelower R=[k2NRk; \reasonable"valuesofntocomputealowerbound. quicklywithoutreachingtheexacttransitiveclosure.thismethodcanstillbeusedwith constantgivestheexactresultforr.butingeneral,thesizeoftheresultgrowsvery bounds.insomecases,snk=0rkisconstantforngreaterthansomevaluen0,andthis ThetransitiveclosureofRisneverthelessaverysimpleanerelation:R=f(i)! relationr=f(i)!(i+1)g,anditisevenunabletogiveanyinterestingapproximation. (i0):ii0g.moreclevertechniquesshouldthusbeusedtoapproximatetransitive Now,thepreviousiterativetechniqueisunabletondtheexacttransitiveclosureof closuresofanerelations.kellyetal.designedsuchamethodandimplementedit bounds i.e.conservativeapproximations andlowerbounds,see[kprs96]fordetails. inomega[kprs96].itisbasedonapproximatinggeneralanerelationsinasubclasswheretransitiveclosurecanbecomputedexactly.theycoinedthetermd-form (dfordierence)todenethisclass.theirtechniqueallowscomputationofbothupper 3.2 Thissectionstartswithashortreviewofbasicconcepts,thenwerecallformallanguages propertiesinterestingtoourpurpose.seethewellknownbookbyhopcroftandullman MonoidsandFormalLanguages Languages(volume1)[RS97a]fordetails. [HU79],thersttwochaptersofthebookbyBerstel[Ber79],andtheHandbookofFormal denotedbymultiplication.asemi-groupwhichhasaneutralelementisamonoid.the Asemi-groupconsistsofasetMandanassociativebinaryoperationonM,usually MonoidsandMorphisms monoidstructureiswidelyusedinthiswork,withseveraldierentbinaryoperations. neutralelementofamonoidisunique,andisusuallydenotedby1mor1forshort.the GiventwosubsetsAandBofamonoidM,theproductofAandBisdenedby sub-semi-group(resp.sub-monoid)ofmifa2a(resp.a2aand1m2a).given ThisdenitionconvertsP(M)intoamonoidwithunitf1Mg.AsubsetAofMisa AB=fc2M:(9a2A;9b2B:c=ab)g:
92 3.2.MONOIDSANDFORMALLANGUAGES anysubsetaofm,theset A+=[n1An 91 isasub-semi-groupofm,and witha0=f1mgisasub-monoidofm.infact,a+(resp.a)istheleastsub-semi-group A=[n0An group(resp.sub-monoid)generatedbya.ifm=aforsomeam,thenaisasystem ofgeneratorsofm.amonoidisnitelygeneratedifithasanitesetofgenerators. (resp.sub-monoid)fortheorderofsetinclusioncontaininga.itiscalledthesub-semi- ofelementsofa,withn0,andwithtupleconcatenationasbinaryoperation.when Aisniteandnon-empty,itiscalledanalphabet,tuplesarecalledwords,elementsofA arecalledlettersandtheneutralelementiscalledtheemptywordanddenotedby".a ForanysetA,thefreemonoidAgeneratedbyAisdenedbytuples(a1;:::;an) thenumberofletterscomposingu.bydenition,thelengthoftheemptywordis0.for formallanguageisasubsetofafreemonoida,andthelengthjujofawordu2ais factors.theproductoftwolanguagesisalsocalledconcatenation. willalsousetheclassicalnotionsofprexes,suxes,wordreversal,sub-wordsandword aletterainanalphabeta,thenumberofoccurrencesofainaisdenotedbyjuja.we (monoid)morphism:m!m0isafunctionsatisfying Wealsorecallthedenitionofamonoidmorphism.IfMandM0aremonoids,a IfAandBaresubsetsofMand:M!M0isamorphism,then (1M)=1M0and8m1;m22M:(m1;m2)=(m1)(m2): RationalLanguages (AB)=(A)(B);(A+)=(A)+;and(A)=(A): Thissectionsrecallsbasicdenitionsandresults,tosetnotationsandallowreferencein nitesetqofstates,asetiqofinitialstates,asetfqofnalstates,andanite setoftransitions(a.k.a.edges)eqaq. laterchapters. FreemonoidAisoftenremovedforcomodity,whenclearfromthecontext:wewrite GivenanalphabetA,a(nite-state)automatonA=(A;Q;I;F;E)consistsofa A=(Q;I;F;E).Atransition(q;x;q0)2Eisusuallywrittenqx state,q0isthearrivalstate,andxisthelabelofthetransition.atransitionwhoselabel is"iscalledan"-transition. Apathisaword(p1;x1;q1)(pn;xn;qn)inEsuchasqi=pi+1foralli2f1;:::;n!q0,qisthedeparting statetoanalone.anautomatonistrimwhenallitsstatesareaccessibleandmaybe 1g,andx1xniscalledthelabelofthepath.Anacceptingpathgoesfromaninitial label,andastatewithdeparting"-transitionmaynothavedepartinglabeledtransitions. isasingleletteror",atmostonetransitionmaysharethesamedepartingstateand partofanacceptingpath. ThelanguagejAjrealizedbyanite-stateautomatonAisdenedbyu2jAjiu Anautomatonisdeterministicwhenithasasingleinitialstate,everytransitionlabel labelsanacceptingpathofa.aregularlanguageisalanguagerealizedbysomenite-state automaton.
93 92Anyregularlanguagecanberealizedbyanite-stateautomatonwithout"-transitions CHAPTER3.FORMALTOOLS andwherealltransitionlabelsaresingleletters.anyregularlanguagecanberealizedby adeterministicnite-stateautomaton. catenationandthestaroperation. languagesoveracontainingtheemptysetandsingletons,andclosedunderunion,con- ThefamilyofrationallanguagesoveranalphabetAisequaltotheleastfamilyof Theorem3.1(Kleene)LetAbeanalphabet.ThefamilyofrationalandregularlanguagesoverAcoincides. Thefollowingwellknowntheoremisatthecoreofformallanguagetheory. morphism. undertheplusoperation,intersection,complementation,reversal,morphismandinverse Beyondtheclosurepropertiesincludedinthedenition,rationallanguagesareclosed Proposition3.1Thefollowingproblemsaredecidableforrationallanguages:membershipinlineartime,emptiness,niteness,emptinessofthecomplement,nitenessof thecomplement,inclusion,equality. Werecallafewbasicfactsaboutalgebraiclanguagesandpush-downautomata.See [HU79,Ber79]foranextensiveintroduction. AlgebraicLanguages phabetaofterminalletters,analphabetvofvariables alsoknownasnon-terminals distinctfroma,andanitesetpv(v[a)ofproductions. Analgebraicgrammar a.k.a.context-freegrammar G=(A;V;P)consistsofanal- theyaregroupedtogetherusingnotation!1j2jjn. andwewriteg=(v;p).aproduction(;)2pisusuallywrittenintheform!, andif!1;2;:::;!nareproductionsofghavingthesameleft-handside, Whenclearfromthecontext,thealphabetisremovedfromthegrammardenition, derivationrelationasanextensionoftheproductionnotation!: LetAbeanalphabetandletG=(V;P)beanalgebraicgrammar.Wedenethe Then,foranyp2N,p!isthepthiterationof!,and+!and!aredenedasusual. Ingeneral,grammarsarepresentedwithadistinguishednon-terminalScalledthe f!g()92v;9u;;v2(v[a):!2p^f=uv^g=uv: axiom.thisallowstodenethelanguagelggeneratedbyagrammarg=(v;p)by context-freelanguage. AlanguageLGgeneratedbysomealgebraicgrammarGisanalgebraiclanguage a.k.a. LG=fu2A:S!ug: tions,reversal,morphism,inversemorphism,andintersectionwithrationallanguages. Indeed,algebraiclanguagesareclosedunderunion,concatenation,starandplusopera- Mostexpectedclosurepropertiesholdforalgebraiclanguages,butnotintersection. model,wepreferinthisworkanotherrepresentation. stackalphabet,anon-emptyword0in +calledtheinitialstackword,anitesetq GivenanalphabetA,apush-downautomatonA=(A; ;0;Q;I;F;E)consistsofa Althoughthemostnaturaldenitionofalgebraiclanguagescomesfromthegrammar
94 3.2.MONOIDSANDFORMALLANGUAGES ofstates,asetiqofinitialstates,asetfqofnalstates,andanitesetof 93 sition(q;x;g;;q0)2eisusuallywrittenqx:g! isinherited,andgiscalledthetopstacksymbol.anemptystackwordisdenotedby". transitions(a.k.a.edges)eqa Q. FreemonoidAisoftenremovedforcommodity,whenclearfromthecontext.Atran- beread,qisthecurrentstateand2 isthewordcomposedofsymbolsinthestack. Acongurationofapush-downautomatonisatriple(u;q;),whereuisthewordto!q0,thenite-stateautomatavocabulary that Thetransitionbetweentwocongurationsc1=(u1;q1;1)andc2=(u2;q2;2)isdenoted byrelation7!anddenedbyc7!c0ithereexist(a;g;;0)2a such Thenp 7!withp2N,+ u1=au2^1=0g^2=0^(q1;a;g;;q2)2e: nalstate,whenu2lithereexist(qi;qf;)2if suchthat Apush-downautomatonA=( ;0;Q;I;F;E)issaidtorealizethelanguageLby 7!and 7!aredenedasusual. Apush-downautomatonA=( ;0;Q;I;F;E)issaidtorealizethelanguageLbyempty stack,whenu2lithereexist(qi;qf)2ifsuchthat (u;qi;0) 7!(";qf;): tobeinthesetofnalstates. Noticethatrealizationbyemptystackimpliesrealizationbynitestate:qfisstillrequired (u;qi;0) 7!(";qf;"): Theorem3.2Thefamilyoflanguagesrealizedbynalstateorbyemptystackbypushdownautomataisthefamilyofalgebraiclanguagesposessomerestrictionsontheexpressivepowerandbringsaninterestingclosureproperty. Apush-downautomatonisdeterministicwhenithasasingleinitialstate,everytransition labelisasingleletteror",atmostonetransitionmaysharethesamedepartingstate,labelandtopstacksymbol,andastatewithdeparting"-transitionmaynothavedeparting Unlikenite-stateautomata,thedeterministicpropertyforpush-downautomataimtomatonwhosetransitionlabelsareeither"orasingleletter.Thefamilyoflanguages labeledtransitions. ministicalgebraiclanguages.itshouldbenoticedthatthisfamilyisalsoknownaslr(1) realizedbynalstatebydeterministicpush-downautomataiscalledthefamilyofdeter- Itisstraightforwardthatanyalgebraiclanguagecanberealizedbyapush-downau- Proposition3.2Thefamilyoflanguagesrealizedbyemptystackbydeterministicpushdownautomataisthefamilyofdeterministicalgebraiclanguageswithprexproperty. (whichisequaltolr(k)fork1)inthesyntacticalanalysisframework[asu86]. propertyisthefollowing: forbidsutobelongtol,forallwordsuandnon-emptywordsv.theinterestingclosure RecallthatalanguageLhastheprexpropertywhenaworduvbelongingtoL Proposition3.3Thefamilyofdeterministicalgebraiclanguagesisclosedundercomplementation.
95 94However,closureofdeterministicalgebraiclanguagesunderunionandintersectionare CHAPTER3.FORMALTOOLS Proposition3.4Thefollowingproblemsaredecidableforalgebraiclanguages:membership,emptiness,niteness. membershipinlineartime,emptinessofthecomplement,nitenessofthecomple- Thefollowingproblemsareundecidableforalgebraiclanguages:beingarational Theseadditionalproblemsaredecidablefordeterministicalgebraiclanguages: notavailable.decidabilityofdeterministicalgebraiclanguagesamongalgebraiconesis unknown,despitethenumberoftriesandrelatedworks[rs97a]. ment. language,emptinessofthecomplement,nitenessofthecomplement,inclusion(open arefrequentlyobservedinouranalysisframework[coh99a].thelukasiewiczlanguage problemfordeterministicalgebraiclanguages),equality(idem). productions -Loveranalphabetfa;bgisthelanguagegeneratedbyaxiomandthegrammarwith Weconcludethissectionwithasimplealgebraiclanguageexamplewhoseproperties TheLukasiewiczlanguageisapparentedtoDycklanguages[Ber79]andisthesimplest ofafamilyoflanguagesconstructedinordertowritearithmeticexpressionswithout!ajb: parentheses(prexor\polish"notation):theletterarepresentsabinaryoperationand brepresentstheoperand.indeed,therstwordsof-lare Proposition3.5Letw2fa;bg.Thenw2-Lijwja jwjb= 1andjuja jujb0 foranyproperleftfactoruofw(i.e.9v2fa;bg+:w=uv).moreover,ifw;w02-l, b;abb;aabbb;ababb;aaabbbb;aababbb;::: Thisimpliesthat-Lhastheprexproperty,see[Ber79]fordetails.Agraphicalrepresentationmayhelpunderstandintuitivelythepreviouspropositionandpropertiesof jww0ja jww0jb=jwja jwjb+jw0ja jw0jb: then w=aabaabbabbabaaabbbyieldsfigure3.1.a. languagebyemptystack.ithasasinglestate,whichisbothinitialandnal,asinglestack -L:drawingthegraphoffunctionu7!juja jujbasurangesovertheleftfactorsof symboli.theinitialstackwordisalsoi,itisdenotedas!iontheinitialstate.the push-downautomatoninfigure3.1.crealizes-lbynalstate.twostatesarenecessary, Eventually,Figure3.1.bshowsapush-downautomatonwhichrealizestheLukasiewicz Importantremark.Inthefollowing,everypush-downautomatonwillimplicitlyacceptwordsbynalstate. aswellastwostacksymbolszandi,theinitialstackwordbeingz Aninterestingsub-classofalgebraiclanguagesiscalledtheclassofone-counterlanguages. Itisdenedthroughpush-downautomata.Aclassicaldenitionisthefollowing:Apushdownautomatonisaone-counterautomatonifitsstackalphabetcontainsonlyoneletter. One-CounterLanguages
96 3.2.MONOIDSANDFORMALLANGUAGES Figure3.1.a.Evolutionofoccurrencecountdierences a a b a a b b a b b a b a a a b b b b!ib;i!" a;i!ii 1!Zb;I!" 1 ";Z!Z 2 acceptingbyemptystack Figure3.1.b.Push-downautomaton ingbynalstate Figure3.1.c.Push-downautomatonaccept- a;i!ii a;z!zi Analgebraiclanguageisaone-counterlanguageifitisrealizedbyaone-counterautomaton...Figure3.1.StudyingtheLukasiewiczlanguage... counterlanguages.thisdenitionisabitmoretechnical. (bynalstate). Denition3.2(one-counterautomatonandlanguage)Apush-downautomaton However,wepreferadenitionwhichismoresuitabletoourpracticalusageofone- isaone-counterautomatonifitsstackalphabetcontainsthreeletters,z(for\zero"), I(for\increment")andD(for\decrement")andifthestackwordbelongstothe ItiseasytoshowthatDenition3.2describesthesamefamilyoflanguagesasthe realizedbyaone-counterautomaton(bynalstate). (rational)setzi+zd.analgebraiclanguageisaone-counterlanguageifitis stackwordzstandsforcountervalue0. wordzinstandsforcountervaluen,stackwordzdnstandsforcountervalue n,and ber"theoriginalsymbolinthestatename.intuitively,ifnisapositiveinteger,stack precedingclassicaldenition:theideaistoreplaceallstacksymbolsbyiandto\remem- languages,andappearsasanaturalabstractioninourprogramanalysisframework.the counterautomatonrealizingit.thisexampleintroducesspecicnotationstosimplifythe Lukasiewiczlanguageisasimpleexampleofone-counterlanguage,Figure3.2showsaone- Thefamilyofone-counterlanguagesisstrictlyincludedinthefamilyofalgebraic!n presentationofone-counterautomata: +nforn0standsforpushinginontothestackifthestackwordisinzi,andif negative,andzifnisequaltozero; standsforinitializationofthestackwordtozinisnispositive,zdnifnis
97 96 thestackwordiszdkitsstandsforremovingmax(n;k)symbolsthen,ifn>k, CHAPTER3.FORMALTOOLS +nforn<0standsfor ( n); pushingbackin kontothestack; nforn0standsforpushingdnontothestackifthestackwordisinzd,andifthe nforn<0standsfor+( n); stackwordiszikitsstandsforremovingmax(n;k)symbolsthen,ifn>k,pushing backdn kontothestack; =0standsfortestingifthetopstacksymbolisZ; >0standsfortestingifthetopstacksymbolisI; 6=0standsfortestingifthetopstacksymbolisnotZ; <0standsfortestingifthetopstacksymbolisD; 0standsfortestingifthetopstacksymbolisZorI; 0standsfortestingifthetopstacksymbolisZorD. Theseoperationsaretheonlyavailablemeanstocheckandupdatethecounter.Moreover, testsfor0canbeappliedbeforeadditionsorsubtractions:<0; 1standsforallowingthe forincrementingthecounterinallcases.seealsothetransitionlabeledbybonfigure3.2. transitionanddecrementingthecounterwhenthecounterisnegative,and";+1stands andeqaf";=0;6=0;>0;<0;0;0gzq. analphabet(removedwhenclearfromthecontext),c0istheinitialvalueofthecounter,... Thegeneralformforaone-counterautomatonisthus(A;c0;Q;I;F;E)],whereAis!1 b;>0; 1";=0 a; tiontomulti-counterlanguages,alsocalledminskymachines[min67].thegeneralform...figure3.2.one-counterautomatonforthelukasiewiczlanguage... kthcounterandeisdenedontheproductofallstacks.however,ithasbeenshown ofn-counterautomatais(a;c10;:::;cn+0;q;i;f;e),whereck0istheinitialvalueofthe Afterthisshortpresentationofone-counterlanguages,onewouldexpectageneraliza- thattwo-counterautomatahavethesameexpressivepowerasturingmachines which guages.however,afewadditionalrestrictionsonthisfamilyoflanguageshaverecently isastrongerresultthanthewellknownequivalenceofturingmachinesandtwo-stack automata.mostinterestingquestionsthusbecomeundecidableformulti-counterlan-
98 beenproventoenableseveraldecidabilityresults,asfortheemptinessproblem.studying 3.3.RATIONALRELATIONS 97 Jurski[CJ98]. work,butmostinterestingapplicationswouldprobablyarisefromworkbycomonand theapplicabilityofthesenewresultstoourprogramanalysisframeworkisleftforfuture 3.3 Westartwithdenitionandbasicpropertiesofrecognizableandrationalrelations,then introducethemachinesrealizingrationaltransductions.afterstudyingsomeexamples, RationalRelations wereviewdecisionproblemsandclosureproperties.thissectionrecallsclassicalresults, see[eil74,ber79,ab88]fordetails. eratedmonoids. Werecallthedenitionandausefulcharacterizationofrecognizablesetsinnitelygen RecognizableandRationalRelations Denition3.3(recognizableset)LetMbeamonoid.AsubsetRofMisarecognizablesetifthereexistanitemonoidN,amorphismfromMtoNandasubset PofNsuchthat(R)=P. tonon-freemonodswhichpreservesthestructureofbooleanalgebra: Proposition3.6LetMbeamonoid,both?andMarerecognizablesetsinM.Recognizablesetsareclosedunderunion,intersectionandcomplementation. Recognizablesetscanbeseenasageneralizationofrational(a.k.a.regular)languages thestaroperation.butitisthecaseofrationalsets,whichextendrecognizableones. Althoughrecognizablesetsareclosedunderconcatenation,theyarenotclosedunder Theirdenitionisborrowedfromrationallanguages: Denition3.4(rationalset)LetMbeamonoid.ThefamilyofrationalsetsinMis theleastfamilyofsubsetsofmholding?andsingletonsfmgm,closedunder eralẇhentherearetwomonoidsm1andm2suchthatm=m1m2,arecognizable union,concatenationandthestaroperation. subsetofmiscalledarecognizablerelation.thefollowingresultdescribesthe\structure" However,rationalsetsarenotclosedundercomplementationandintersection,ingen- ofrecognizablerelations. Theorem3.3(Mezei)ArecognizablerelationRinM1M2isaniteunionofsetsof theformklwherek(resp.l)isarationalsetofm1(resp.m2). rationalsetswhicharerelationsbetweennitelygeneratedmonoids. ofmiscalledarationalrelation.inthefollowing,wewillonlyconsiderrecognizableor WhentherearetwomonoidsM1andM2suchthatM=M1M2,arationalsubset
99 98Thefollowingcharacterizationofrationalrelationsisfundamental:itallowstoexpress CHAPTER3.FORMALTOOLS Theorem3.4(Nivat)LetMandM0betwomonoids.ThenRisarationalrelationover rationalrelationsbymeansofrationallanguagesandmonoidmorphisms.(theformulationisslightlydierentfromtheoriginaltheorembynivat,see[ber79]fordetails.) MandM0ithereexistanalphabetA,twomorphisms:A!M,0:A!M0, andarationallanguagekasuchthat RationalTransductionsandTransducers R=f((h);0(h)):h2Kg: relationroverm1andm2,wedeneatransductionfromm1intom2asafunctionfrom mayalsobeenextendedtoamappingfromp(m1)top(m2),andwewrite:m1!m2. M1intothesetP(M2)ofsubsetsofM2,suchthatv2(u)iuRv.Forcommodity, Werecallherea\morefunctional"viewofrecognizableandrationalrelations.Froma ductionsareclosedunderinversion(i.e.relationalsymmetry). nizable(resp.rational)relationoverm1andm2.bothrecognizableandrationaltrans- Inthenextsections,weuseeitherrelationsortransductions,dependingonthecontext. Atransduction:M1!M2isrecognizable(resp.rational)iitsgraphisarecog- retainsthebooleanalgebrastructureandtheclosureundercomposition. Thefamilywewillstudyliessomewherebetweenrecognizableandrationalrelations;it Theorem3.5(ElgotandMezei)IfA,BandCarealphabets,1:A!Band2: monoids. Thefollowingresult duetoelgotandmezei[em65,ber79] isrestrictedtofree B!Carerationaltransductions,then21:A!Cisarationaltransduction. Theorem3.6(Nivat)LetMandM0betwomonoids.Then:M!M0isarational transductionithereexistanalphabeta,twomorphisms:a!m,0:a!m0, Nivat'stheoremcanberewrittenforrationaltransductions: andarationallanguagekasuchthat seechapter4. Thesetwotheoremsarekeyresultsfordependenceanalysisanddependencetesting, 8m2M:(m)=0( 1(m)\K): Denition3.5(rationaltransducer)ArationaltransducerT=(M1;M2;Q;I;F;E) rationaltransducers;theyextendnite-stateautomatainaverynaturalway: The\mechanical"representationsofrationalrelationsandtransductionsarecalled initialstatesiq,asetofnalstatesfq,andanitesetoftransitions(a.k.a. consistsofaninputmonoidm1,anoutputmonoidm2,anitesetofstatesq,asetof writet=(q;i;f;e).sinceweonlyconsidernitelygeneratedmonoids,thetransitions MonoidsM1andM2areoftenremovedforcommodity,whenclearfromthecontext:we edges)eqm1m2q. ofatransducercanequivalentlybechoseninq0(g1[f1m1g)(g2[f1m2g)q0, whereg1(resp.g2)isasetofgeneratorsform1(resp.m2)andq0issomesetofstates largerthanq.
100 3.3.RATIONALRELATIONS Mostofthetime,wewillbedealingwithfreemonoids i.e.languages;theempty 99 wordisthentheneutralelementandisdenotedby". f1;:::;n 1g,and(x1xn;y1yn)iscalledthelabelofthepath.Atransduceris trimwhenallitsstatesareaccessibleandmaybepartofanacceptingpath. Apathisaword(p1;x1;y1;q1)(pn;xn;yn;qn)inEsuchasqi=pi+1foralli2 (f;g)labelsanacceptingpathoft.itisaconsequenceofkleene'stheoremthatasubset ofm1m2isarationalrelationiitisrecognizedbyarationaltransducer: ThetransductionjTjrealizedbyarationaltransducerTisdenedbyg2jTj(f)i Theorem3.7Thefollowingproblemsaredecidableforrationalrelations:whethertwo Proposition3.7Atransductionisrationaliitisrealizedbyarationaltransducer. Letusnowpresentdecidabilityandundecidabilityresultsforrationalrelations. Theorem3.8LetR,R0berationalrelationsoveralphabetsAandBwithatleasttwo However,mostotherusualquestionsareundecidableforrationalrelations. wordsareinrelation(inlineartime),emptiness,niteness. letters.itisundecidablewhetherr\r0=?,rr0,r=r0,r=ab, nitelygeneratedmonoids,butitisnotthecaseingeneral. AfewquestionsmaybecomedecidablewhenreplacingAandBbysomeparticular (AB) Risnite,Risrecognizable. following.itformalizesthefactthatarationaltransducercanbeinterpretedasanitestateautomatononamorecomplexalphabet.butbeware:bothinterpretationshave dierentpropertiesingeneral. Thefollowingdenitionwillbeusefulinsometechnicaldiscussionsandproofsinthe Denition3.6LetTbearationaltransduceroveralphabetsAandB.ThenitestateautomatoninterpretationofTisanite-stateautomatonAoverthealphabet (AB)[(Af"g)[(f"gB)denedbythesamestates,initialstates,nalstates andtransitions Weneedafewresultsaboutrationaltransductionsthatarepartialfunctions. RationalFunctionsandSequentialTransducers Denition3.7(rationalfunction)LetM1andM2betwomonoids.Arationalfunc- Card( :M1!M2isarationaltransductionwhichisapartialfunction,i.e.suchthat monoids,butwewillseearesultaboutcompositionofrationalfunctionsovernon-free MostclassicalresultsaboutrationalfunctionssupposethatM1andM2arefree (u))1forallu2m1. exponential[ber79].thefollowingresultbyblattnerandhead[bh77]showsthatitis monoidsinsection3.5.inthefollowing,however,m1andm2willbefreemonoids. AintoBisapartialfunction.However,therstalgorithmbySchutzenbergerwas decidableinpolynomialtime. GiventwoalphabetsAandB,itisdecidablewhetherarationaltransductionfrom Theorem3.9ItisdecidableinO(Card(Q)4)whetherarationaltransducerwhosesetof statesisqimplementsarationalfunction.
101 100Rationalfunctionshavetwoadditionaldecidableproperties: CHAPTER3.FORMALTOOLS Theorem3.10Giventworationalfunctionsfandf0fromAtoB,itisdecidable whetherff0andwhetherf=f0. \onlinecomputation"isthefollowing:itrequiresthatwhenapatheleadingtoastate qislabeledbypairofwords(u;v),andwhenaletterxisread,thereisonlyonestate ducerswhoseoutputcanbe\computedonline"withitsinput.ourinterpretationfor Amongtransducersrealizingrationalfunctions,weareespeciallyinterestedintrans- Denition3.8(inputandoutputautomata)Theinputautomaton(resp.outputautomaton)ofatransducerisobtainedbyomittingtheoutputlabel(resp.inputlabel) understoodusingthefollowingdenitions. q0andoneoutputletterysuchthat(ux;vy)labelsapathprexedbye.thisisbest Denition3.9(sequentialtransducer)LetAandBbetwoalphabets.Asequential ofeachtransition. thatithasasingleinitialstate). transducerislabeledinabanditsinputautomatonisdeterministic(whichenforces Figure3.3.a,whoseinitialstateis1issequential.Itreplacesbyathebswhichappear afteranoddnumberofbs. quentialifitcanberealizedbyasequentialtransducer.thetransducerexamplein Asequentialtransducerobviouslyrealizesarationalfunction;andafunctionisse-... aja 1 bja bjb 2 bjb aja 1abjb 2b Figure3.3.a.Sequentialtransducer aja bjb...figure3.3.sequentialandsub-sequentialtransducers... Figure3.3.b.Sub-sequentialtransducer T=(A;B;Q;I;F;E),onemayassociatea\nextstate"function:QA!Qanda closed,i.e.ifuvbelongstoitsdomainthenitisthesameforu.2toasequentialtransducer whenallthestatesofasequentialtransducerarenal,thefunctionitrealizesisprex Notethataif isasequentialfunctionand (")isdened,then(")=".moreover, T.However,thesequentialtransducerdenitionisabittoorestrictiveregardingour thesetfofnalstates,functionsandareindeedanequivalentcharacterizationof \nextoutput"function:qa!bwhosepurposeisself-explanatory.togetherwith Denition3.10(sub-sequentialtransducer)IfAandBaretwoalphabets,asubsequentialtransducer(T;)overABisapaircomposedofasequentialtransducer \onlinecomputation"property,andwepreferthefollowingextension. 2In[Ber79,Eil74],allstatesofasequentialtransducerarenal.
102 3.4.LEFT-SYNCHRONOUSRELATIONS ToverABwithFassetofnalstates,andofafunction:F!B.The 101 function nalstateq;inthiscase (u)isdenedithereisanacceptingpathintlabeledby(ujv)andleadingtoa realizedby(t;)isdenedasfollows:letubeawordina,thevalue ofthecomputation.asub-sequentialtransducerisobviouslyarationalfunction;anda Inotherwords,thefunctionisusedtoappendawordtotheoutputattheend (u)=v(q). functionissub-sequential:consider(q)="forallnalstatesq. functionissub-sequentialifitcanberealizedbyasequentialtransducer.asequential functionisnotsequentialbecauseallitsstatesarenalanditisnotprexclosed. thesub-sequentialtransducerinfigure3.3.bappendstoeachworditslastletter.this Thisdenitionmatchesour\onlinecomputation"property.Thefunctionrealizedby Theorem3.11Itisdecidableifafunctionrealizedbyatransducerissub-sequential, anditisdecidableifasub-sequentialfunctionissequential. ThefollowingresulthasbeenprovenbyChorutin[Cho77]. therstmaygenerateanexponentialnumberofstates;asaresult,thisdoesnotprovide tobuildasub-sequentialrealizationandasequentialrealizationarealsoprovided,but functionissub-sequential,andifasub-sequentialfunctionissequential.twoalgorithms BealandCarton[BC99b]givetwopolynomial-timealgorithmstodecideifarational edbysub-sequentialtransducersisstillsatisedforalargerclassofrationalfunctions: apolynomial-timealgorithmtodecideifarationalfunctionissequential. Denition3.11(onlinerationaltransducer)Arationaltransducerisonlineifitisa Beforeweconcludethissection,noticethatthe\onlinecomputation"propertysatisrationalfunctionandifitsinputautomatonisdeterministic.Arationaltransduction theinputautomaton,aslongasthedeterministicpropertyiskept.wearenotawareof isonlineifitisrealizedbyanonlinerationaltransducer. anyresultforthisclassofrationalfunctions,strictlylargerthantheclassofsub-sequential transductions.butifitwasdecidableamongrationalfunctions,itwouldprobablyreplace Theonlydierencewithrespecttosub-sequentialtransducersisthat"isallowedin tions. everyuseofsub-sequentialfunctionsinthefollowingapplications. sequentialfunctions,whicharedecidableinpolynomial-timeamongrationaltransduc- Inouranalysisandtransformationframework,wewillonlyuserationalandsub- criticalfordependenceanalysis.addressingtheundecidableproblemoftestingwhether 3.4 Wehaveseenthatrationalrelationsarenotclosedunderintersection,butintersectionis Left-SynchronousRelations theintersectionoftworationalrelationsisemptyornot,feautrierdesigneda\semialgorithm"fordependencetestingwhichsometimesnotterminate[fea98].becausewe approachisdierent:wearelookingforasub-classofrationalrelationswithaboolean algebrastructure(i.e.withunion,intersectionandcomplementation). wouldliketoeectivelycomputetheintersection,andnotonlytestingitsemptiness,our synchronousrelationsarenotdecidableamongrationalones,butwecoulddeneaprecise amoreexpressiveone:theclassofleft-syncrhonousrelations.wewillshowthatleft- Indeed,theclassofrecognizablerelationsisabooleanalgebra,butwehavefound
103 algorithmtoconservativelyapproximaterelationsintoleft-synchronousones.infact, 102 CHAPTER3.FORMALTOOLS havealreadybeenpublishedbyfrougnyandsakarovitchin[fs93].however,ourwork hasbeendoneindependentlyandbasedonadierent moreintuitiveandversatile representationoftransductions.proofsareallnew,andseveralunpublishedresultshave thispointisevenmoreinterestingforusthandecidability.manyresultspresentedhere propertiesarelostandwecouldnotdeneanypreciseapproximationalgorithmforthis relations[ps98]denedbypelletierandsakarovitch.butsomeinterestingdecidability alsobeendiscovered. class,seesection Noticethatalargerclasswithabooleanalgebrastructureistheclassofdeterministic la-vallee). ThisworkhasbeendoneincollaborationwithOlivierCarton(UniversityofMarne- Werecallthedenitionofsynchronoustransducers: Denition3.12(synchronism)ArationaltransduceronalphabetsAandBissynchronousifitislabeledonAB. relation. Arationalrelationortransductionissynchronousifitcanberealizedbyasyn- Denitions chronoustransducer.arationaltransducerissynchronizableifitrealizesasynchronous showedthatthereciprocalistrue:alengthpreservingrationaltransductionisrealized byasynchronoustransducer. Obviously,suchatransducerislengthpreserving;EilenbergandSchutzenberger[Eil74] Denition3.13(-synchronism)ArationaltransduceronalphabetsAandBissynchronousifeverytransitionappearinginacycleofthetransducer'sgraphislabeled Arstextensionofthesynchronouspropertyisthe-synchronousone: onab. synchronousrelation. synchronoustransducer.arationaltransduceris-synchronizableifitrealizesa- Arationalrelationortransductionis-synchronousifitcanberealizedbya realizedbya-synchronoustransducer.obviously,theboundis0whenthetransduceris showedthatthereciprocalistrue:aboundedlengthdierencerationaltransductionis synchronous.twoexamplesareshowninfigure3.4.theyrespectivelyrealizef(u;v)2 Suchatransducerhasaboundedlengthdierence;FrougnyandSakarovitch[FS93] Denition3.14(left-synchronism)ArationaltransduceroveralphabetsAandBis fa;bgfa;bg:u=vgandf(u;v)2fa;bgfcg:juja=jvjc^jujb=2g. left-synchronousifitislabeledon(ab)[(af"g)[(f"gb)andonlytransitions Then,wedenetwonewextensions: labeledonaf"g(resp.f"gb)mayfollowtransitionslabeledonaf"g(resp. f"gb). 3Itappearstobeaspecialcaseofk;l-synchronoustransducers,wherek=l=1,seeSection synchronoustransducer.arationaltransducerisleft-synchronizableifitrealizesa left-synchronousrelation. Arationalrelationortransductionisleft-synchronousifitisrealizedbyaleft-
104 3.4.LEFT-SYNCHRONOUSRELATIONS aja,bjb 1 ajc 1 bj" ajc 2 bj" ajc 3 Figure3.4.a.Asynchronoustransducer...Figure3.4.Synchronousand-synchronoustransducers... Figure3.4.b.A-synchronoustransducer Denition3.15(right-synchronism)ArationaltransduceroveralphabetsAandB isright-synchronousifitislabeledon(ab)[(af"g)[(f"gb)andonlytransitions labeledonaf"g(resp.f"gb)mayprecedetransitionslabeledonaf"g(resp. f"gb). right-synchronoustransducer.arationaltransducerisright-synchronizableifitrealizes aright-synchronousrelation. Arationalrelationortransductionisright-synchronousifitcanberealizedbya (a.k.a.orderings),where<txtissomeorderona:theprexorderf<preg,f9h2 A:f=uav^g=ubw^a<b)g. A:f=ghgandthelexicographicorderf<lexg,ff<preg_(9u;v;w2A;a;b2 Figure3.5showsleft-synchronoustransducersoveranalphabetArealizingtwoorders Inthefollowingtransducers,labelsxandystandfor8x2Aand8y2Arespectively.... xjx 1 "jy "jy "jy Figure3.5.a.Prexorder 2 "jy5 1xjy;x<txtyxjy 2xj" "jy "jy xjx xj" Figure3.5.Left-synchronousrealizationofseveralorderrelations... Theword-reversaloperationconvertsaleft-synchronoustransducerintoaright- Figure3.5.b.Lexicographicorder synchronousoneandconversely.4thetwodenitionsarenotcontradictory:somere- lationsareleftandrightsynchronous,suchassynchronousones. 4Recognizable,synchronousand-synchronousrelationsareclosedunderword-reversal.
105 104Figure3.6showsatransducerrealizingtherelation=f(u;v)2AB:jujjvj CHAPTER3.FORMALTOOLS mod2g.itisneitherleft-synchronousnorright-synchronous,buttheleft-synchronousand Inthethreefollowingtransducers,labelsxandystandfor8x2Aand8y2B. right-synchronousrealizationsinthesamegureshowthatisleftandrightsynchronous.... xjy 1"jx 2 3 xj" 4 "jx "jy xjy xj" yj" (left-synchronous) 5 xyj" 1 3 (leftandrightsynchronizable) "jxy 5 "jx "jy 2 xjy yj" "jx 4 xj" 1...Figure3.6.Aleftandrightsynchronizableexample... xj" (right-synchronous) extendtoright-synchronousthroughtheword-reversaloperationandmostinteresting transducersareleft-synchronous. Inthefollowingwemostlyconsiderleft-synchronoustransducers,becauseallresults Itiswellknownthatsynchronousand-synchronousrelationsareclosedunderunion, complementation,intersection.weshowthatitisthesameforleft-synchronousrelations. AlgebraicProperties Lemma3.1(Union)Theclassofleft-synchronousrelationsisclosedunderunion. Proof:LetT=(Q;I;F;E)andT0=(Q0;I0;F0;E0)beleft-synchronoustransducers. QandQ0canbesupposeddisjointwithoutlossofgenerality;andthen(Q[Q0;I[ left-synchronousrealizationoftheunion. Theproofisconstructive:giventwoleft-synchronousrealizations,onemaycomputea I0;F[F0;E[E0)realizesjTj[jT0j. Theorem3.12Recognizablerelationsareleft-synchronous. Hereisadirectapplication: Proof:LetRbearecognizablerelationinAB.FromTheorem3.3,there existsanintegern,a1;:::;an2a,andb1;:::;bn2bsuchthattau=a1 B1[[AnBn.Leti2f1;:::;ng,AA=(QA;IA;FA;EA)acceptingAi,and follows: (QAQB)[QA[QB,I=IAIB,F=FAFB[FA[FB,andEisdenedas AB=(QB;IB;FB;EB)acceptingBi.WesupposeQAandQBaredisjointsets withoutlossofgenerality anddeneatransducert=(q;i;f;e),whereq= 1.AlltransitionsinEAandEBarealsoinE;
106 3.4.LEFT-SYNCHRONOUSRELATIONS IfqAx!q0A2EAandqBy!q0B2EB,then(qA;qB)xjy!(q0A;q0B)2E; 3.IfqA(resp.q0B)isanalstateandqBy!q0B2EB(resp.qAx!q0A2EA),then (qa;qb)"jy!q0b2e(resp.(qa;qb)xj"!q0a2e). Byconstruction,Tisleft-synchronous,itsinputisAianditsoutputisBi.Moreover, itacceptsanycombinationofinputwordsinaiandoutputwordsinbi.lemma3.1 terminatestheproof. Theproofisconstructive:givenadecompositionofarecognizablerelationintoproducts ofrationallanguages,onemaybuildaleft-synchronoustransducer. Anotherapplicationisthisusefuldecompositionresultforleft-synchronousrelations: Proposition3.8Anyleft-synchronousrelationcanbedecomposedintoaunionofrelationsoftheformSR,whereSissynchronousandRhaseithernoinputornooutput (Risthusrecognizable). Proof:ConsiderarelationU2ABrealizedbyaleft-synchronoustransducer T,andconsideranacceptingpatheinT.TherestrictionofTtothestatesand transitionsineyieldsatransducerte,suchasjtejjtj.morover,tecanbedivided intotransducerstsandtr,suchasthe(unique)nalstateoftherstisthe(unique) initialstateofthesecond,tsissynchronousandtrhaseithernoinputornooutput. Therfore,Terealizesaleft-synchronousrelationoftheformSR,whereSissynchronous andrhaseithernoinputornooutput.sincethenumberof\restricted"transducers Teisnite,closureunderunionterminatestheproof. Theproofisconstructiveiftheleft-synchronousrelationtobedecomposedisgivenbya left-synchronousrealization. Tostudycomplementationandintersection,weneedtwomoredenitions:unambiguityandcompletion. Denition3.16(unambiguity)ArationaltransducerToverAandBisunambiguous ifanycoupleofwordsoveraandblabelsatmostonepathint.arationalrelation isunambiguousifitisrealizedbyanunambiguoustransducer. Thisdenitioncoincideswiththeonein[Ber79]SectionIV.4forrationalfunctions, butdiersforgeneralrationaltransductions. Denition3.17(completion)ArationaltransducerTiscompleteifeverypairof wordslabelsatleastonepathint(acceptingornot). Itisobviouslynotalwayspossibletocompleteatransducerinatrimone.Fromthese twodenitions,letusrecallaverygeneralresult. Theorem3.13Theclassofacompleteunambiguousrationalrelationsisclosedunder complementation. Proof: LetRbeacompleteunambiguousrelationrealizedbytransducerT= (Q;I;F;E).WedeneatransducerT0=(Q;I;Q F;E)suchthatanacceptingpath intcannotbeoneoft0.thecompletionoftandtheuniquenessofacceptingpaths intshowsthatthecomplementationofrisrealizedbyt0.
107 106 CHAPTER3.FORMALTOOLS Theproofisconstructive. Now,wespecializethisresultforleft-synchronousrelations. Lemma3.2Aleft-synchronousrelationisrealizedbyanunambiguousleft-synchronous transducer. Proof:LetTbealeft-synchronoustransduceroverAandBrealizingarelationR. LetAbethenite-stateautomatoninterpretationofT overthealphabet(ab)[ (Af"g)[(f"gB) andleta0beadeterministicnite-stateautomatonaccepting thesamelanguageasa.letf;gtwowordssuchthatjtj(f)=g,andleteande0be twoacceptingpathsint. Supposeediersfrome0.Bythedeterminimproperty,thewordswandw0theyaccept ina0alsodiers;let(x;y)and(x0;y0)betherstdierence.ifx="andx06=", thedenitionofleft-synchronoustransducersimposesthatwtobelabeledinf"gb after(x;y),theneande0acceptdierentinputsint.thesamereasoningappliesto thethreeothercases y="andy06=",x0="andx6=",y0="andy6=" and yieldsdierentinputsoroutputsforpathseande0.thiscontradictsthedenitionof eande0. Thusfandgareacceptedbyauniquepathintherationaltransducerinterpretation T0ofA0.SinceA0isthedeterminizationofA,atransitionlabeledonAf"g(resp. f"gb)mayonlybefollowedbyanothertransitionlabeledonaf"g(resp.f"gb). Eventually,T0isunambiguousandleft-synchronous,anditrealizesR. Theproofisconstructive. Proposition3.9Aleft-synchronousrelationisrealizedbyacompleteunambiguousleftsynchronoustransducer. Proof:LetRbealeft-synchronousrelation.WeuseLemma3.2tocomputean unambiguousleft-synchronoustransducert=(q;i;f;e)whichrealizesr.wedene atransducert0=(q0;i;f;e0),whereqi,qoandqioarethreenewstates,q0= Q[fqi;qo;qiog,andE0isdenedasfollows: 1.AlltransitionsinEarealsoinE0. 2.Forall(x;y)2AB,qioxjy!qio2E0. 3.Forallx2A,qioxj"!qi2E0andqixj"!qi2E0. 4.Forally2B,qio"jy!qo2E0andqo"jy!qo2E0. 5.Ifq2Qissuchthat8(x0;q0)2AQ:q0x0j"!q62E,then8(y00;q00)2BQ: q"jy00!q0062e)q"jy00!qo2e0. 6.Ifq2Qissuchthat8(y0;q0)2BQ:q0"jy0!q62E,then8(x00;q00)2AQ: qx00j"!q0062e)qx00j"!qi2e0. 7.Ifq2Qissuchthat8(x0;q0)2AQ:q0x0j"!q62Eand8(y0;q0)2BQ:q0"jy0! q62e,then8(x00;y00;q00)2abq:qx00jy00!q0062e)qx00jy00!qio2e0.
108 3.4.LEFT-SYNCHRONOUSRELATIONS Theresultingtransducerisleft-synchronous,complete,andrealizesrelationR.Moreover,thethreelastcaseshavebeencarefullydesignedtopreservetheunambiguous 107 property:notransitiondepartingfromastateqisaddedifitslabelisalreadytheone Theorem3.14(ComplementationandIntersection)Theclassofleft-synchronous Theproofisconstructive. ofanexistingtransitiondepartingfromq. relationsisclosedundercomplementationandintersection. intersection. undercomplementation.togetherwithclosureunderunion,thisprovesclosureunder Proof:AsacorollaryofTheorem3.13andProposition3.9,wehavetheclosure algebra,whichwillbeofgreathelpfordependenceandreachingdenitionanalysis,see Eventually,wehaveproventhattheclassofleft-synchronousrelationsisaboolean butitisnottrueforleft-synchronousones.however,wehavethefollowingresult: Section4.3. Proposition3.10LetS,TandRberationalrelations. Synchronousand-synchronousrelationsareobviouslyclosedunderconcatenation, (ii)iftisleft-synchronousandrisrecognizable,thentrisleft-synchronous. (i)ifsissynchronousandtisleft-synchronous,thenstisleft-synchronous. WeuseProposition3.8topartitionTintoS1R1;:::;SnRnwhereSiissynchronous Proof: synchronoustransducers(seeproposition3.12forageneralization). Proofof(i)isastraightforwardapplicationofthedenitionofleftsynchronizablefromTheorem3.12.Applicationof(i)showsthatSiRiRisleftsynchronizable.Closureunderunionterminatestheproofof(ii). andriisrecognizableforall1in.now,ririsrecognizable,henceleft- Theproofisconstructivewhenaleft-synchronousrealizationofTisprovided,thanksto Proposition3.8.Ageneralizationof(i)isgiveninSection intersection.indeed,bydenitionofleft-synchronousrelations,applyingclassicalalgoactlythesamepropertiesastitself,regardingcomputationofthecomplementationand automatoninterpretation(seedenition3.6)ofaleft-synchronoustransducerthasex- Toclosethissectionaboutalgebraicproperties,oneshouldnoticethatthenite-state transducershavethesamecomplexityasfornite-stateautomataingeneral. rithmsfromautomatatheorytothenite-stateautomatoninterpretationyieldscorrectre- sultsonthetransducer.thisremarkshowsthatalgebraicoperationsforleft-synchronous Synchronousand-synchronoustransductionsareclosedunderinversion(i.e.relational symmetry)andcomposition.clearly,theclassofleft-synchronoustransductionsisalso FunctionalProperties closedunderinversion. Theorem3.15Theclassofleft-synchronoustransductionsisclosedundercomposition. denitionanalysis(tosolve(4.17)insection4.3.3). Combinedwiththebooleanalgebrastructure,thefollowingresultisusefulforreaching
109 108 CHAPTER3.FORMALTOOLS Proof:ConsiderthreealphabetsA,BandC,twotransductions1:A!Band 1:B!C,andtwoleft-synchronoustransducersT1=(Q1;I1;F1;E1)realizing1 andt2=(q2;i2;f2;e2)realizing2.wesupposeq1andq2aredisjointsets without lossofgenerality anddenet=(q1q2[q1[q2;i1i2;f1f2[f1[f2;e)as 1.AlltransitionsinE1andE2arealsoinE; 2.Ifq1xjy!q012E1andq2yjz!q022E2,then(q1;q2)xjz!(q01;q02)2E; 3.Ifq1xj"!q012E1andq2"jz!q022E2,then(q1;q2)xjz!(q01;q02)2E; 4.Ifq1"jy!q012E1andq2yj"!q022E2,then(q1;q2)"j"!(q01;q02)2E; 5.Ifq1xjy!q012E1andq2yj"!q022E2,then(q1;q2)xj"!(q01;q02)2E; 6.Ifq1"jy!q012E1andq2yjz!q022E2,then(q1;q2)"jz!(q01;q02)2E; 7.Ifq1xj"!q012E1,then8q22F2:(q1;q2)xj"!q012E; 8.Ifq2"jz!q022E2,then8q12F1:(q1;q2)"jz!q022E. First,consideranacceptingpatheinTforacoupleofwords(f;h).Wemaywrite e=e12e0,wheree12istheq1q2partofe.byconstructionoft,theendstateof e12isanalstateoft1ande0isapathoft2,oritistheopposite.consideringthe projectionofstatesine12onq1,e12acceptsacoupleofwords(f;g)int1suchas h22(g).henceh221(f). Second,considerthreewordsf;g;hsuchasg21(f)andh22(g).Lete1bean acceptingpathfor(f;g)int1ande2beonefor(g;h)int2.supposeje1j>je2j.builda pathe12intfromtheproductofstatesandlabelsoftherstje2jtransitionsine1and e2;itsendstateis(q1;q2)withq12q1andq22f2.now,thelastje1j je2jtransitions ine1canbewritten(q1;x;";q01):e01,hencee12:((q1;q2);x;";q01):e01isanacceptingpath for(f;h)int. Eventually,wehaveshownthatTrealizes21.Now,usingtheclassical"j"- transitionremovalalgorithmfornite-stateautomata,wedenetransducert0.it isleft-synchronousbecauset1andt2are,andtransitionsinvolvingstatesofq1or Q2 labeledonaf"gorf"gc areneverfollowedbytransitionsinvolvingstates ofq1q2. Theproofisconstructive. Beforeshowinganimportantapplicationofthisresult,weneedanadditionaldenition: Denition3.18(-selection)Let:A!Bbearationaltransduction,andbe arationalorderonb i.e.arationalrelationwhichisreexive,anti-symmetricand transitive.the-selectionofisapartialfunctiondenedby 8u;v2AB:v=(u)()v=min (u): Proposition3.11Let:A!Bbealeft-synchronoustransduction,andbea left-synchronousorderonb.the-selectionofisaleft-synchronousfunction.
110 3.4.LEFT-SYNCHRONOUSRELATIONS Proof:LetbetheidentityrationalfunctiononB.Ifisthe-selectionof, 109 lexicographicorderfor,seesection4.3.3.formoredetailson-selection,alsoknown theproofcomesfromthefactthat= (( )) Themostinterestingapplicationofthistoourframeworkappearswhenchoosingthe asuniformization,see[ps98]. byberstelin[ber79]theorem8.4,andweuseasimilartechniquetoshowthatitisthe Itiswellknownthattherecognizabilityofatransductionisundecidable.Thisisproved AnUndecidabilityResult sameforleft-synchronousrelations.westartwithapreliminaryresult. Lemma3.3LetKbeapositiveinteger,letA=fa;bg,letBbeanyalphabet,andlet u1;u2;:::;up2b.dene Then,UandU+arerationalrelations,andrelation(AB) U+isalsorational. U=f(abK;u1);(ab2K;u2);:::;(abpK;up)g: Proof:RelationUisnite,hencerational,andU+isrationalbyclosureunder concatenationandthestaroperation. theonlysubstitutionofbbybk. toprovesomethinghere.thisisdonethesamewayasin[ber79]lemma8.3,with Usually,theclassofrationalrelationsisnotclosedundercomplementation,sowehave Theorem3.16LetAandBbealphabetswithatleasttwoletters.Givenarational relationroveraandb,itisundecidablewhetherrisleft-synchronous. Considertwosequencesu1;u2;:::;upandv1;v2;:::;vpofnon-emptywordsoverB, andletkbetheirmaximumlength.dene Proof:WemayassumethatAcontainsexactlytwoletters,andsetA=fa;bg. FromLemma3.3,U,V,U+,V+,U=(AB) U+andV=(AB) V+are rationalrelations. U=f(abK;u1);:::;(abpK;up)gandV=f(abK;v1);:::;(abpK;vp)g: LetR=U[V.Sinceleft-synchronoustransductionsareclosedundercomplementation,Risleft-synchronousi(AB) R=U+\V+issider(m;u)2U+\V+.Wemaywritem=fgwithjfj=jujandjgj>0.Leftsynchronismrequiresthat(g;")labelsapathinT.Moreover,((fg)k;uk)2U+\V+ forallk1,hencethepathlabeledby(g;")mustbepartofacycle: AssumeU+\V+isnon-emptyandrealizedbyaleft-synchronoustransducerT.Con- lengthofinputandoutputwordsmustbelessthanorequaltok+1;thisiscontradictory. However,becauseu1;:::;upandv1;:::;vparenon-empty,theratiobetweenthe 9g0:8k:(fg(g0g)k;u)2U+\V+:
111 110Eventually,Risleft-synchronousiU+\V+isempty.5Sincedecidingthisemptiness CHAPTER3.FORMALTOOLS isexactlysolvingthepost'scorrespondenceproblemforu1;:::;upandv1;:::;vp,we Theorem3.17LetAandBbealphabetswithatleasttwoletters.Givenarational haveproventhatleft-synchronismisundecidable. Asimilarproofshowsthefollowingresult,whichisnotacorollaryofTheorem relationroveraandb,itisundecidablewhetherrisleftandrightsynchronous. rationalrelationcanbeprovedleft-synchronous. Despitethegeneralundecidabilityresults,weareinterestedinparticularcaseswherea StudyingSynchronizabilityofTransducers TransmissionRate Werecallthefollowingusefulnotiontogiveanalternativedescriptionofsynchronism intransducers.thetransmissionrateofapathlabeledby(u;v)isdenedastheratio -synchronism,andtheiralgorithmoperatesdirectlyonthetransducerthatrealizesthe jvj=juj2q+[f+1g. transduction.theresultis: transducerisdecidable.frougnyandsakarovitch[fs93]showedasimilarresultfor EilenbergandSchutzenberger[Eil74]showedthatthesynchronismpropertyofa Lemma3.4Arationaltransduceris-synchronizableithetransmissionrateofallits cyclesis1. Lemma3.5Ifthetransmissionrateofallcyclesinarationaltransduceris0or+1, ofitscycles,butonemaygiveasucientcondition: Thereisnocharacterizationofrecognizabletransducersthroughthetransmissionrate +1.Consideringastrongly-connectedcomponent,allitscyclesmustbeofthesame Proof:LetTbearationaltransducerwhosecyclestransmissionratesareonly0and thenitrealizesarecognizablerelation. rate.henceastrongly-connectedcomponenthaseithernoinputornooutput.this provesthatstrongly-connectedcomponentsarerecognizable.closureofrecognizable straightforwardapplicationofpreviousdenitions,onemaygivethefollowingresult: relationsbyconcatenationandbyunionterminatestheproof. Thereisnocharacterizationofleft-synchronizabletransducerseither.However,asa Lemma3.6IfTisaleft-synchronoustransducer,thencyclesofTmayonlyhavethree followcomponentsofrate0,andonlycomponentsofrate+1mayfollowcomponents ofrate+1. componentmusthavethesametransmissionrate,onlycomponentsofrate0may dierenttransmissionrates:0,1and+1.allcyclesinthesamestrongly-connected reciprocalisavailable,seetheorem WehavealsoprovenherethatU+andV+arenotleft-synchronous. Evenifsynchronizabletransducersmaynotsatisfytheseproperties,somekindof
112 3.4.LEFT-SYNCHRONOUSRELATIONS ClassesofTransductions 111 Wehaveshownthatleft-synchronoustransductionsextendalgebraicpropertiesofrecognizabletransductions.Thefollowingtheoremshowsthattheyalsoextendreal-time Theorem3.18-synchronoustransductionsareleft-synchronous. propertiesof-synchronoustransducers. intorelationsriofconstantdelayi,forall i.lettirealizerelationri:by acceptedbyt.takingadvantageofclosureunderintersection,onemaypartitionr Proof:Considera-synchronoustransducerTrealizingarelationRoveralphabets construction,v2jtij(u)ijuj=jvj+i. AandB,andcalltheupperboundondelaysbetweeninputandoutputwords Let\ synchronizable,henceleft-synchronizable. substitutingitsnalstatebyatransduceraccepting("; islengthpreserving,hencesynchronizable.transducert0=t0 [[T0 "beanewlabel;ifiisnon-negative(resp.negative),denet0 i)(resp.( i;")).eacht0 ifromtiin LetPrealizerelationf(u;u a):u2a;a0gandqrealizerelationf(v b;v):v2 isthus i sametransductionast,anditisleft-synchronizablefromtheorem3.15. B;b0g,whicharebothleft-synchronizable.TransducerQT0Prealizesthe onlemmas3.5and3.4: OnemaygoabitfurtherandgiveageneralizationofTheorems3.12and3.18,based Theorem3.19Ifthetransmissionrateofeachcycleinarationaltransduceris0,1or +1,andifnocyclewhoserateis1followsacyclewhoserateisnot1,thenthe sideranacceptationpatheint.therestrictionofttothestatesandtransitionsine transducerisleft-synchronizable. Proof:ConsiderarationaltransducerTsatisfyingtheabovehypotheses,andcon- yieldsatransducerte,suchasjtejjtj.moreover,tecanbedividedintotransducerstsandtr,suchasthe(unique)nalstateoftherstisthe(unique)initialstateof left-synchronizablefromtheorem3.18.eventually,proposition3.10showsthatteis thesecond,andthetransmissionrateofallcyclesis1intsandeither0or+1intr. underunionterminatestheproof. left-synchronizable.sincethenumberof\restricted"transducersteisnite,closure FromLemma3.5,Trisrecognizable.FromLemma3.4,Tsis-synchronizable,hence Theproofisconstructive. Asanapplicationofthistheorem,onemaygiveageneralizationofProposition3.10.(i): Proposition3.12Ifis-synchronousandisleft-synchronous,then:isleftsynchronous. recognizable doesnotsatisfyconditionsoftheorem3.19,sincethetransmissionrateof somecyclesis2. Noticethattheleftandrightsynchronizabletransducerexamplein3.6 whichiseven
113 ResynchronizationAlgorithm 112 CHAPTER3.FORMALTOOLS Althoughleft-synchronismisnotdecidable,onemaybeinterestedinasynchronization algorithmthatworkonasubsetofleft-synchronizabletransducers:theclassoftransducers satisfyingthehypothesisoftheorem3.19. possiblyapproximative intersectionsofrationalrelations.presentationofthefullalgorithmandfurtherinvestigationsaboutitscomplexityareleftforfutureworkrem3.19.thistechniquewillbeusedextensivelyinsections3.6and3.7,tocompute itispossibleto\resynchronize"ourlargerclassalongthelinesoftheproofoftheo- ExtendinganimplementationbyBealandCarton[BC99a]ofthealgorithmin[FS93], Werstpresentanextensionoftheminimalityconceptfornite-stateautomatatoleftsynchronoustransducers.LetT=(Q;I;F;E)beatransduceroveralphabetsAandB. DecidabilityResults Wedenethefollowingpredicate,forq2Qand(u;v)2AB: Nerode'sequivalence,noted,isdenedby Accept(q;u;v)i(u;v)labelsanacceptingpathstartingatq: Theequivalenceclassofq2Qisdenotedby^q.Let qq0iforall(u;v)2ab:accept(q;u;v)()accept(q0;u;v): where^eisnaturallydenedby T==(Q=;I=;F=;^E); synchronoustransducers. UsingNerode'sequivalence,weextendtheconceptofminimalautomatontoleft- (^q1;x;y;^q2)2^e()9(q01;q02)2^q1^q2:(q01;x;y;q02)2e: Theorem3.20Anyleft-synchronoustransductionisrealizedbyauniqueminimalleftsynchrnonoustransducer(uptoarenamingofstates). Bydenitionof,itisclearthatT=realizes. transducert=(q;i;f;e).wesupposewithoutlossofgeneralitythattistrim. Proof:LetbeatransductionoveralphabetsAandB,realizedbyaleft-synchronous f"gb);andconsider(u;v)2absuchthataccept(q;u;v)andaccept(q0;u;v). AnyoutputtransitionfromqmustbelabeledonAf"g(resp.f"gB),hencev q;q02qsuchthatqq0andqholdsaninputtransitionlabeledonaf"g(resp. EverytransitiononT=islabeledonAB[Af"g[f"gB.Considertwostates (resp.u)mustbeempty.sincethisistrueforallaccepted(u;v),andsincetistrim, provesthatt=isleft-synchronous. anyoutputtransitionfromq0mustalsobelabeledonaf"g(resp.f"gb);this Finally,letAbethenite-stateautomatoninterpretationofT(seeDenition3.6).It iswellknownthata=istheuniqueminimalautomatonrealizingthesamerational languageasa(uptoarenamingofstates).thus,ift0isanrealizationofwithas
114 3.4.LEFT-SYNCHRONOUSRELATIONS manystatesast=,itsnite-stateautomatoninterpretationmustbea=(upto 113 arenamingofstates)whichistheinterpretationoft=.thisprovestheunicityof becomedecidableforleft-synchronoustransductions: theminimalleft-synchronoustransducer. Asacorollaryofclosureundercomplementationandintersection,usualquestions Lemma3.7LetR,R0beleft-synchronousrelationsoveralphabetsAandB.Itis denitionabstractionsinthefollowingchapter. decidablewhetherr\r0=?,rr0,r=r0,r=ab,(ab) Risnite. synchronousones.wehavestrongargumentstoexpectapositiveresult,butnoproofat Thesepropertiesareessentialforformalreasoningaboutdependenceandreaching themoment. Eventually,wearestillworkingondecidabilityofrecognizablerelationsamongleft Wenowconsiderpossibleextensionsofleft-synchronizablerelations. FurtherExtensions Anelementaryvariationonsynchronoustransducersconsistsinenforcingasingletransmissionrateinallcycleswhichisnotnecessary1:ifkandlarepositiveintegers,a ConstantTransmissionRates synchronoustransducers. arelabeledinakbl.similarly,onemaydene-(k;l)-synchronousandleft-(k;l)- (k;l)-synchronousrelationoverabisrealizedbyatransducerwhosetransitions cyclesisnow0,+1andk=l.mixingrelationsin(k;l)-synchronousclassesfordierent foranykandl,includingk=l=1.theonlydierenceisthattransmissionratesof intoaclassicalsynchronousone,itobviouslyappearsthatthesamepropertiesaresatised Whennoticingthatachangeofthealphabetconvertsa(k;l)-synchronoustransducer (k;l)isnotallowed,ofcourse. generalleft-(k;l)-synchronoustransductions. left-(1;1)-synchronous,thatisleft-synchronous...thisstronglyreducestheusefulnessof However,mostrationaltransductionsusefultoourframework,includingorders,are DeterministicTransducers MuchmoreinterestingistheclassofdeterministicrelationsintroducedbyPelletierand Sakarovitchin[PS98]: Denition3.19(deterministictransducerandrelation)LetAandBbetwoal- (i)thereexistsapartitionofthesetofstatesq=qa[qbsuchthatthelabelofan tionshold: phabets.atransducert=(a;b;q;i;f;e)isdeterministicifthefollowingcondi- (ii)foreveryp2qandevery(x;y)2(af"g)[(f"gb),thereexistsatmostone edgedepartingfromastateinqaisinaf"gandthelabelofanedgedeparting fromastateinqbisinf"gb; deterministic); q2qsuchthat(p;x;y;q)isine(i.e.thenite-stateautomatoninterpretationis
115 114 (iii)thereisasingleinitialstateini. CHAPTER3.FORMALTOOLS properties:thegreatestlossisclosureundercomposition.moreover,becauserelationu+ Thisclassisstrictlylargerthanleft-synchronousrelations,andkeepsmostofitsgood Adeterministicrelationisrealizedbyadeterministictransducer. relationisrecognizable,left-synchronousorbothleftandrightsynchronous. isdeterministicintheproofoftheorem3.16,itisundecidablewhetheradeterministic realizationofarelation,ortohelpapproximatearationalrelationbyadeterministicone. deterministiconesisthatthereisnoresultsuchastheorem3.19tondadeterministic Butthemostimportantreasonforustouseleft-synchronousrelationsinsteadof sivenessthanrationalrelations:\niteautomatacannotcount",andweneedcounting 3.5 Forthepurposeofourprogramanalysisframework,wesometimesrequiremoreexpres- BeyondRationalRelations tohandlearrays!wethuspresentanextensionofthealgebraic alsoknownascontextfree propertytorelationsbetweennitelygeneratedmonoids.asonewouldexpect, Proposition3.13. theclassofalgebraicrelationsincludesrationalrelations,andretainsseveraldecidable properties.thissectionsendswithafewcontributions:theorems3.27and3.28,and downautomata(seesection3.2.3) Wedenealgebraicrelationsthroughpush-downtransducers,denedsimilarlytopush- AlgebraicRelations Denition3.20(push-downtransducer)GivenalphabetsAandB,apush-down Qofstates,asetIQofinitialstates,asetFQofnalstates,andaniteset transducert=(a;b; ;0;Q;I;F;E) a.k.a.algebraictransducer consistsofa stackalphabet,anon-emptyword0in +calledtheinitialstackword,aniteset Atransition(q;x;y;g;;q0)2Eisusuallywrittenqxjy:g! FreemonoidsAandBareoftenremovedforcommodity,whenclearfromthecontext. oftransitions(a.k.a.edges)eqab Q. isthepairofwordtobeacceptedorrejected,qisthecurrentstateand2 is andrationaltransducervocabulariesareinherited. Acongurationofapush-downautomatonisaquadruple(u;v;q;),where(u;v)!q0.Thepush-downautomata thewordcomposedofsymbolsinthestack.thetransitionbetweentwocongurations ithereexist(x;y;g;;0)2ab suchthat c1=(u1;v1;q1;1)andc2=(u2;v2;q2;2)isdenotedbyrelation7!anddenedbyc7!c0 Thenp 7!withp2N,+ u1=xu2^v1=yv2^1=0g^2=0^(q1;x;y;g;;q2)2e: (u;v)2rithereexist(qi;qf;)2if suchthat Apush-downtransducerT=( ;0;Q;I;F;E)issaidtorealizetherelationR,when 7!and 7!aredenedasusual. (u;v;qi;0) 7!(";";qf;):
116 3.5.BEYONDRATIONALRELATIONS Apush-downtransducerT=( ;0;Q;I;F;E)issaidtorealizetherelationR,when 115 (u;v)2rithereexist(qi;qf)2ifsuchthat tobeinthesetofnalstates. Noticethatrealizationbyemptystackimpliesrealizationbynitestate:qfisstillrequired (u;v;qi;0) 7!(";";qf;"): Denition3.21(algebraicrelation)Theclassofrelationsrealizedbynalstateor damental:itallowstoexpressalgebraicrelationsbymeansofalgebraiclanguagesand byemptystackbypush-downtransducersiscalledtheclassofalgebraicrelations. usesthistheoremasadenitionforalgebraicrelationsin[ber79].) monoidmorphisms.aproofinamuchmoregeneralcasecanbefoundin[kar92].(berstel Asforrationalrelations,thefollowingcharacterizationofalgebraicrelationsisfun- Theorem3.21(Nivat)LetAandBbetwoalphabets.ThenRisanalgebraicrelation overaandbithereexistanalphabetc,twomorphisms:c!a, B,andanalgebraiclanguageLCsuchthat R=f((h); (h)):h2lg: :C! algebraicrelations. Nivat'stheoremcanbeformulatedasfollowsforalgebraictransductions: TogeneralizeSection3.3.2,algebraictransductionsarethefunctionalcounterpartof Theorem3.22(Nivat)LetAandBbetwoalphabets.Then:A!Bisan algebraictransductionithereexistanalphabetc,twomorphisms:c!a, :C!B,andanalgebraiclanguageLCsuchthat Theorem3.23Algebraicrelationsareclosedunderunion,concatenation,andthestar Letusrecallsomeusefulpropertiesofalgebraicrelationsandtransductions. 8w2A: (w)= ( 1(w)\L): operation.theyarealsoclosedundercompositionwithrationaltransductions(similar toelgotandmezeitheorem).theimageofarationallanguagebyanalgebraic buttherearesomeinterestingexceptions: Theimageofanalgebraiclanguagebyanalgebraictransductionmaynotbealgebraic, transductionisanalgebraiclanguage(thankstonivat'stheorem). Theorem3.24(Evey)Givenapush-downtransducerT,ifListhealgebraiclanguage algebraiclanguage. Thefollowingdenitionwillbeusefulinsometechnicaldiscussionsandproofsinthe realizedbytheinputautomatonoft(seedenition3.8),theimaget(l)isan following.itformalizesthefactthatapush-downtransducercanbeinterpretedasa havedierentpropertiesingeneral. push-downautomatononamorecomplexalphabet.butbeware:bothinterpretations Denition3.22LetTbeapush-downtransduceroveralphabetsAandB.ThepushdownautomatoninterpretationofTisapush-downautomatonAoverthealphabet states,initialstates,nalstatesandtransitions. (AB)[(Af"g)[(f"gB)denedbythesamestackalphabet,initialstackword,
117 116Amongtheusualdecisionproblems,onlythefollowingareavailableforalgebraic CHAPTER3.FORMALTOOLS Theorem3.25Thefollowingproblemsaredecidableforalgebraicrelations:whether relations: Importantremarks.Inthefollowing,everypush-downtransducerwillimplicitlyacceptwordsbynalstate.Recognizableandrationalrelationsweredenedforanynitely twowordsareinrelation(inlineartime),emptiness,niteness. AlgebraicFunctions generatedmonoids,butalgebraicrelationsaredenedforfreemonoidsonly. Denition3.23(algebraicfunction)LetAandBbetwoalphabets.Analgebraic Thereareveryfewresultsaboutalgebraictransductionsthatarepartialfunctions.Here isthedenition: thatcard(f(u))1forallu2a. However,wearenotawareofanydecidabilityresultforanalgebraictransductionto functionf:a!bisanalgebraictransductionwhichisapartialfunction,i.e.such beapartialfunction,andwebelievethatthemostlikelyanswerisnegative. tomaton: ducerswhoseoutputcanbe\computedonline"withitsinput.asforrationaltransducers, ourinterpretationfor\onlinecomputation"isbasedonthedeterminismoftheinputau- Amongtransducersrealizingalgebraicfunctions,weareespeciallyinterestedintrans- Denition3.24(onlinealgebraictransducer)Analgebraictransducerisonlineif itisapartialfunctionandifitsinputautomatonisdeterministic.analgebraic decidabilityofdeterministicalgebraiclanguagesamongalgebraiconesisunknown. Nevertheless,wearenotawareofanyresultsforthisclassofalgebraicfunctions;even transductionisonlineifitisrealizedbyanonlinealgebraictransducer. Itisdenedthroughpush-downtransducers.Aclassicaldenitionisthefollowing: Aninterestingsub-classofalgebraicrelationsiscalledtheclassofone-counterrelations One-CounterRelations Denition3.25Apush-downtransducerisaone-countertransducerifitsstackalphabet containsonlyoneletter.analgebraicrelationisaone-counterrelationifitisrealized practicalusageofone-counterrelations. Asforone-counterlanguages,wepreferadenitionwhichismoresuitabletoour byaone-countertransducer(bynalstate). Denition3.26(one-countertransducerandrelation)Apush-downtransduceris aone-countertransducerifitsstackalphabetcontainsthreeletters,z(for\zero"), I(for\increment")andD(for\decrement")andifthestackwordbelongstothe byaone-countertransducer(bynalstate). (rational)setzi+zd.analgebraicrelationisaone-counterrelationifitisrealized
118 3.5.BEYONDRATIONALRELATIONS ItiseasytoshowthatDenition3.26describesthesamefamilyoflanguagesasthe 117 ofone-counterrelationsisstrictlyincludedinthefamilyofalgebraicrelations. precedingclassicaldenition. machines,asformulti-counterautomata,seethelastparagraphinsection3.2.4forfurther Weusethesamenotationsasforone-counterlanguages,seeSection3.2.4.Thefamily discussionsaboutthistopic. NoticethatusingmorethanonecountergivesthesameexpressivepowerasTuring Indeed,thewellknowntheorembyElgotandMezei(Theorem3.5inSection3.3)canbe \partly"extendedtoanynitelygeneratedmonoids: analysisframeworkthatweneedtocomposerationaltransductionsovernon-freemonoids. Now,whyareweinterestedinsuchaclassofrelations?Wewillseeinourprogram Theorem3.26(ElgotandMezei)IfM1andM2arenitelygeneratedmonoids,A isanalphabet,1:m1!aand2:a!m2arerationaltransductions,then transductioncompositionisnotfree.moreprecisely,wewouldliketocomputethecompositionoftworationaltransductions21,when1:a!znand2:zn!b,for Butthisextensionisnotinterestinginourcase,sincethe\middle"monoidinour 21:M1!M2isarationaltransduction. somealphabetsaandbandsomepositiveintegern.sadly,becauseofthecommutative intuitiveviewofthiscomesfromthefactthatall\words"onzoftheform groupnatureofz,compositionof2and1isnotarationaltransductioningeneral.an areequalto0,butdonotbuildarationallanguageinf1; 1g(theybuiltacontext-free {z k } {z k } andtheproofgivesaconstructivewaytobuildatransducerrealizingthecomposition: one)ẇehaveproventhatsuchacompositionyieldsan-countertransductioningeneral, Theorem3.27LetAandBbetwoalphabetsandletnbeapositiveinteger.If1: A!Znand2:Zn!Barerationaltransductions,then21:A!Bisa 1andT2=(Z;B;Q2;I2;F2;E2)realize2.Wedeneaone-countertransducer Proof:Werstsupposethatnisequalto1.LetT1=(A;Z;Q1;I1;F1;E1)realize n-countertransduction. transducert0 then(q;u;";";+v;q0)2e01(nocountercheck).similarly,wedeneaone-counter 1=(A;B;0;Q1;I1;F1;E01) withnooutputonb fromt1:if(q;u;v;q0)2e1 outputoft1andt2arereplacedbycounterupdatesint0 if(q;u;v;q0)2e2then(q;";v;"; u;q0)2e02(nocountercheck).intuitively,the 2=(A;B;0;:::;cn0;Q2;I2;F2;E02) withnoinputfroma fromt2: Thenwedeneaone-countertransducerT=(A;B;0;Q1[Q2[fqFg;I1;fqFg;E) updatesint0 asakindofconcatenationoft0 2. 1andT0 1andoppositecounter ife2e01thene2e; 2: ife2e02thene2e;
119 118ifq12F1andq22I2then(q1;";";";";q2)2E(neithercounterchecknorcounter CHAPTER3.FORMALTOOLS ifq22f2then(q2;";";=0;";qf)2e(nocounterupdate); update); noothertransitionisine. aone-countertransducerandrecognizes21. wouldbeacceptedbyt2andthecounteriszerowhenreachingstateqf.then,tis Intuitively,Tacceptspairsofwords(u;v)when(u;")wouldbeacceptedbyT1,(";v) Finally,ifnisgreaterthan1,thesameconstructioncanbeappliedtoeachdimension ofzn,andtheassociatedcountercheckandupdatescanbecombinedtobuilda Inpractice,wewillrestrictourselveston=1applyingconservativeapproximations n-countertransducerrealizing21. Theorem3.27willbeusedinSection4.3toprovepropertiesofthedependenceanalysis. describedinsection3.7,eitheron1and2oronthemulti-countercomposition. Denition3.27(underlyingrationaltransducer)LetT=( ;0;Q;I;F;E)bea push-downtransducer. Wenowrequireanadditionalformalizationoftherationaltransducer\skeleton"ofa push-downtransducer.wecanbuildarationaltransducert0=(q;i;f;e0)fromt insetting(q;x;y;q0)2e0()9g2 ;2 :(q;x;y;g;;q0)2e: TheunderlyingrationaltransducerofTistherationaltransducerobtainedintrimming transducertrealizing21:thetransmissionrateofeverycycleintiseither0or+1. LookingattheproofofTheorem3.27,thereisaveryinterestingpropertyabout T0andremovingalltransitionslabeled"j". Proposition3.13LetAandBbetwoalphabetsandletnbeapositiveinteger.Let ThankstoLemma3.5inSection3.4,wehaveproventhefollowingresult: 1:A!Znand2:Zn!BberationaltransductionsandletTbean-counter transducerrealizing21:a!b(computedfromtheorem3.27).then,the duction,thankstothetechniquepresentedinsection Applicationsofthisresultincludeclosureunderintersectionwithanyrationaltrans- underlyingrationaltransduceroftisrecognizable. nestedtreesandarraysareneithermodeledbyfreemonoidsnorbyfreecommutative monoids.theirgeneralstructureiscalledafreepartiallycommutativemonoid,seesection2.3.3.letaandbbetwoalphabets,andmbesuchamonoidwithbinaryoperation.westillwanttocomputethecompositionofrationaltransductions21,when Eventually,whenstudyingabstractmodelsfordatastructures,wehaveseenthat Theorem3.28LetAandBbetwoalphabetsandletMbeafreepartiallycommutative 1:A!Mand2:M!B.ThefollowingresultisanextensionofTheorem3.27, anditsproofisstillconstructive: inm(seedenition2.6). transduction.thenumberofcountersisequaltothemaximumdimensionofvectors monoid.if1:a!mand2:m!bthen21:a!bisamulti-counter
120 3.6.MOREABOUTINTERSECTION Proof:Becausethefullproofisrathertechnicalwhileitsintuitionisverynatural,we 119 onlysketchthemainideas.consideringtworationaltransducerst1andt2realizing 1and2respectively,westartapplyingtheclassicalcompositionalgorithmforfree monoidstobuildatransducertrealizing21.butthistime,twillbemulticounter,everycounterisinitializedto0,andtransitionsgeneratedbytheclassical compositionalgorithmsimplyignorethecounters. Now,everytimeatransitionofT1writesavectorv(resp.T2readsavectorv),the sitionsreading(resp.writing)vectorsofthesamedimensionasvareconsideredint2 (resp.t1),andvisaddedtothecountersusingthetechniqueintheorem3.27.when \normalexecution"oftheclassicalcompositionalgorithmis\suspended",onlytran- zerobefore\resuming"the\normalexecution"oftheclassicalcompositionalgorithm. Theresultisatransducerwithrationalandmulti-counterparts,separatedbychecks aletterisreadorwrittenduringthe\suspendedmode",eachcounterischeckedfor forzero. Theorem3.28willalsobeusedinSection Intersectingrelationsisamajorissueinouranalysisandtransformationframework.We haveseenthatthisoperationneitherpreservetherationalpropertynorthealgebraic MoreaboutIntersection specialcasesofintersections. tersection.thepurposeofthissectionistoextendthesesub-classesinordertosupport propertyofarelation;butwehavealsofoundsub-classesofrelations,closedunderin- Forthepurposeofdependenceanalysis,wehavealreadymentionedtheneedforintersectionswiththelexicographicorder.Indeed,theclassofleft-synchronousrelationsincludes IntersectionwithLexicographicOrder alphabeta.wewilldescribeaclasslargerthansynchronousrelationsoveraawhich thelexicographicorderandisclosedunderintersection. isclosedunderintersectionwiththelexicographicorderonly.6 Inthissection,werestrictourselvestothecaseofrelationsoverAAforsome Denition3.28(pseudo-left-synchronism)LetAbeanalphabet.ArationaltransducerT=(A;A;Q;I;F;E)(samealphabetA)ispseudo-left-synchronousifthereexist (i)anytransitionbetweenstatesofqiislabeledxjxforsomeaina; apartitionofthesetofstatesq=qi[qs[qtsatisfyingthefollowingconditions: (iii)therestrictionofttostatesinqi[qsisleft-synchronous. (ii)anytransitionbetweenastateofqiandastateofqtislabeledxjyforsomex6=y ina; 6ThisclassisnotcomparablewiththeclassofdeterministicrelationsproposedinDenition3.19of pseudo-left-synchronoustransducer.arationaltransducerispseudo-left-synchronizable ifitrealizesapseudo-left-synchronousrelation. Arationalrelationortransductionispseudo-left-synchronousifitisrealizedbya Section3.4.7.
121 120Anintuitiveviewofthisdenitionwouldbethatapseudo-left-synchronoustransducer CHAPTER3.FORMALTOOLS Proposition3.14Theclassofpseudo-left-synchronousrelationsisclosedunderintersectionwiththelexicographicorder. Proof:Becausethenon-left-synchronouspartisprecededbytransitionslabeledxjy satisestheleft-synchronismpropertyeverywherebutaftertransitionslabeledxjywith x6=y.themotivationforsuchadenitioncomesfromthefollowingresult: partisdonethankstotheorem3.14. withx6=y,whicharethemselvesprecededbytransitionslabeledxjx,intersectionwith thelexicographicorderbecomesstraightforwardonthispart:ifx<ythetransition iskeptintheintersection,otherwiseitisremoved.intersectingtheleft-synchronous Proposition3.15Intersectingapseudo-left-synchronousrelationwiththeidentityrelationyieldsaleft-synchronousrelation. Anotherintersectingresultisthefollowing: removedeverytime. Proof:Sameideaastheprecedingproof,buttransitionsxjywithx6=yarenow tion,complementationandcomposition. Ofcourse,pseudo-left-synchronousrelationsareclosedunderunion,butnotintersec- labeledxjx,leavethefollowingtransitionsunchanged. left-synchronousrelations:whenatransitionlabeledxjyisfoundafterapathoftransitions Eventually,theconstructiveproofofTheorem3.19canbemodiedtolookforpseudo algebraiclanguagesunderintersectionwithrationallanguageshasnoextensiontoalgebraicrelations.still,itiseasytoseethatthereisapropertysimilartoleft-synchronism Whataboutintersectionofalgebraicrelations?Thewellknownresultaboutclosureof ThecaseofAlgebraicRelations Proposition3.16LetR1beanalgebraicrelationrealizedbyapush-downtrans- whichbringspartialintersectionresultsforalgebraicrelations. ducerwhoseunderlyingrationaltransducerisleft-synchronous,andletr2bealeft- synchronousrelation.thenr1\r2isanalgebraicrelation,andonemaycomputea isleft-synchronous. Proof:LetT1beapush-downautomatonrealizingR1whoseunderlyingrational push-downtransducerrealizingtheintersectionwhoseunderlyingrationaltransducer transducert0 TheproofcomesfromthefactthatintersectingT0 getting"theoriginalstackoperationassociatedwitheachtransitionint1.thisis duetothecross-productnatureoftheintersectionalgorithmfornite-stateautomata 1isleft-synchronous,andletT2bealeft-synchronousrealizationofR2. (whichalsoappliestoleft-synchronoustransducers). 1andT2canbedonewithout\for-
122 3.7.APPROXIMATINGRELATIONSONWORDS Ofcourse,thepseudo-left-synchronismpropertycanbeusedinsteadoftheleftsynchronousone,yieldingthefollowingresult: realizedbyapush-downtransducerwhoseunderlyingrationaltransducerispseudoleft-synchronous.thenintersectingrwiththelexicographicorder(resp.identityrelation)yieldsanalgebraicrelation,andonemaycomputeapush-downtransducerrealizingtheintersectionwhoseunderlyingrationaltransducerispseudo-left-synchronous 121 Proposition3.17LetAbeanalphabetandletRbeanalgebraicrelationoverAA 3.7(resp.left-synchronous). Thissectionisatransitionbetweenthelongstudyofmathematicaltoolsexposedinthis chapterandapplicationsofthesetoolstoouranalysisandtransformationframework. ApproximatingRelationsonWords information,andthatourprogramtransformationswerebasedonconservativeapproximationsofsetsandrelations.studyingapproximationsisratherunusualwhendealing withwordsandrelationsbetweenwords,butwewillshowitspracticalinterestinthe nextchapters. RememberwehaveseeninSection2.4thatexactresultswerenotrequiredfordata-ow Ourgeneralapproximationschemeforrationalandalgebraicrelationsisthustonda onlywhenaquestionoranoperationonrationaloralgebraicrelationsisnotdecidable. resultsshouldbelookedforeverytimeitispossible.indeed,approximationsareneeded Ofcourse,suchconservativeapproximationsmustbeaspreciseaspossible,andexact conservativeapproximationinasmallerclasswhichsupportstherequiredoperationor forwhichtherequiredquestionisdecidable Sometimesarecognizableapproximationofarationalrelationmaybeneeded.IfRisa ApproximationofRationalRelationsbyRecognizableRelations rationalrelationrealizedbyarationaltransducert=(q;i;f;e),thesimplestwayto inputandoutputlanguagesofr. andtodenekqi;qfastheproductofinputandoutputlanguagesoftherelationrealized buildarecognizablerelationkwhichislargerthanristodenekastheproductof by(q;fqig;fqfg;e).thenkisdenedastheunionofallkqi;qfforall(qi;qf)2if. ThisbuildsarecognizablerelationthankstoMezei'sTheorem3.3. Asmarterapproximationistoconsidereachpair(qi;qf)ofinitialandnalstatesinT, isstillrecognizable,thankstomezei'stheorem.thistechniquewillbeconsideredinthe nentintandapproximatingitwiththeprecedingtechnique.theresultingrelationk followingwhenlookingforarecognizableapproximationofarationalrelation. Thenextlevelofprecisionisachievedinconsideringeachstrongly-connectedcompo- Becauserecognizableapproximationsarenotpreciseenoughingeneral,andbecausethe Relations ApproximationofRationalRelationsbyLeft-Synchronous classofleft-synchronousrelationsretainsmostinterestingpropertiesofrecognizablerelations,wewillratherapproximaterationalrelationsbyleft-synchronousones.
123 122ThekeyalgorithminthiscontextisbasedontheconstructiveproofofTheorem3.19 CHAPTER3.FORMALTOOLS andnoapproximationisnecessary.whenitfails,itmeansthatsomestrongly-connected presentedinsection3.4.5.inpracticalcases,itoftenreturnsaleft-synchronoustransducer algorithm. componentcouldnotberesynchronized.theideaisthentoapproximatethisstrongly connectedcomponentbyarecognizablerelation,andthentorestarttheresynchronization connectedcomponentsc1;:::;cnwhosetransmissionratesare0or+1,thenarecogniz- astrongly-connectedcomponentcwhosetransmissionrateis1followssomestrongly- not0,1or+1shouldbeapproximatedthiswayinarststage.inthesamestage,if Forbettereciency,allstrongly-connectedcomponentswhosetransmissionrateis sitionsasc,andallpathsfromc1;:::;cntocshouldnowleadtokc.applyingsucha rststageguaranteesthattheresynchronizationalgorithmwillreturnaleft-synchronous ableapproximationkcofcshouldbeaddedtothetransducerwithsameoutgoingtran- canthenbeapplied,usingtheextendedversionoftheorem3.19proposedinsection3.6. wearelookingforapseudo-left-synchronousapproximation.thesametechniqueasbefore approximationofr,thankstotheorem3.19. Eventually,whentryingtointersectarationaltransducerwiththelexicographicorder, Therearetwoverydierenttechniqueswhenapproximatingalgebraicrelations.Thesimplestoneisusedtogiveconservativeresultstoafewundecidablequestionsforalgebraic ApproximationofAlgebraicandMulti-CounterRelations naltransducerasaconservativeapproximation.precisioncanbeslightlyimprovedwhen transducersthataredecidableforrationalones.itconsistsintakingtheunderlyingratio- statenames.thismayinducealargeincreaseofthenumberofstates.thesecondtechniqueisusedwhenlookingforanintersectionwithaleft-synchronousrelation:itconsists inapproximatingtheunderlyingrationaltransducerwithaleft-synchronous(orpseudo- theyareobviouslylostwhenapproximatingastrongly-connectedcomponentwitharec- canbepreservedintheresynchronizationalgorithm(associatedwiththeorem3.19),but ognizablerelation.whichtechniqueisappliedwillbestatedeverytimeanapproximation left-synchronous)onewithoutmodifyingthestackoperations.infact,stackoperations thestacksizeisbounded:thenitenumberofpossiblestackwordscanbeencodedin ofanalgebraicrelationisrequired. allunboundedcountersbutone.smartchoicesoftheremainingcounterandattemptsto thenconsistsinsavingthevalueofboundedcountersintonewstatesnames,thenremoving n-countertransductionbytheorem3.27.approximationbyaone-countertransduction Eventually,wehaveseenthatcomposingtworationaltransductionsoverZnyieldsa combinetwocountersintoonehavenotbeenstudiedyet,andareleftforfuturework.
124 123 Chapter4 InstancewiseAnalysisforRecursive Programs Eventhoughdependenceinformationisatthecoreofvirtuallyallmodernoptimizing foreourrecentresultsforarrays[cc98],noinstancewisereachingdenitionanalysisfor stancewisedependenceanalysisforrecursivedatastructures,lessthanthreepapershave beenpublished.evenworseisthestateoftheartinreachingdenitionanalysis:be- compilers,recursiveprogramshavenotreceivedmuchattention.whenconsideringin- andreachingdenitionanalysisattherun-timeinstancelevel.thefollowingpresentation recursiveprogramshasbeenproposed. isbuiltonourpreviousworkonthesubject[ccg96,coh97,coh99a,fea98,cc98],but hasbeengoingthroughseveralmajorevolutions.itresultsinamuchmoregeneraland ConsideringtheprogrammodelproposedinChapter2,wenowfocusondependence willshowinalaterchapter(seesection5.5)howthispreciseinformationcanbeused theoretical:welookforthehighestprecisionpossible.beyondthisimportanttarget,we mathematicallysoundframework,withalgorithmsforautomationofthewholeanalysis tooutperformcurrentresultsinparallelizationofrecursiveprograms,andalsotoenable process,butalsoinamorecomplexpresentation.theprimarygoalofthisworkisrather newprogramtransformationtechniques. deferredtothenextsections.eventually,section4.7comparesourresultswithstatic techniqueispresentedinsection4.3,withquestionsspecictoparticulardatastructures variableandstoragemappingfunctioncomputationinsection4.2,thegeneralanalysis Westartourpresentationwithafewmotivatingexamples,thendiscussinduction 4.1 analysesandwithrecentworksoninstancewiseanalysisforloopnests. Studyingthreeexamples,wepresentanintuitiveavoroftheinstancewisedependence andreachingdenitionanalysesforrecursivecontrolanddatastructures. MotivatingExamples hereinfigure4.1.awithapartialcontroltree. OurrstexampleisstilltheprocedureQueens,presentedinSection2.3.Itisreproduced FirstExample:ProcedureQueens instancesofprogramstatements.letusstudyinstancefpiaaaaaajqpiaabbrofstate- StudyingaccessestoarrayA,ourpurposeistonddependencesbetweenrun-time
125 inta[n]; CHAPTER4.INSTANCEWISEANALYSISFORRECURSIVEPROGRAMS PvoidQueens(intn,intk){ A=A=afor(inti=0;i<n;i++){ IB=B=b r if(k<n){ Js if(){ for(intj=0;j<k;j++) Q =A[j]; A[k]=; IAAaAaA FP }}} Queens(n,k+1); FPIAAJs F}intmain(){ FPIAAaAJs FPIAAaAaAJs sss JJJ writea[0] QP Figure4.1.a.ProcedureQueens } Queens(n,0); FPIAAaAaAJQPIAABBr J IAA FBB Figure4.1.b.Compressedcontroltree r readsa[0] mentr,depictedasastarinfigure4.1.b.inordertondsomedependences,wewould...figure4.1.procedurequeensandcontroltree... FPIAAaAaAJQPIAABBris0,soFPIAAaAaAJQPIAABBrreadsA[0]. liketoknowwhichmemorylocationisaccessed.sincejisinitializedto0instatementb,andincrementedby1instatementb,weknowthatthevalueofvariablejat statementq.thus,instancessuchasfpiaajs,fpiaaaajsorfpiaaaaaajswriteinto A[k],weareinterestedinthevalueofvariablek:itisinitializedto0inmain(bythe rstcallqueens(n,0)),andincrementedateachrecursivecalltoprocedurequeensin Wenowconsiderinstancesofs,depictedassquares:sincestatementswritesinto A[0],andarethereforeindependencewithFPIAAaAaAJQPIAABBr. doesexecute:sinceweassumethatfpiaaaaaajqpiaabbrexecutes,thenfpiaaaaaaj is,amongthethreepossiblereachingdenitionsthatareshown,thelasttoexecute.andit againatfigure4.1.b,wenoticethatinstancefpiaaaaaajs denotedbyablacksquare LetusnowderivewhichofthesedenitionsreachesFPIAAaAaAJQPIAABBr.Looking (hencefpiaaaaaajs)hastoexecute.therefore,otherinstanceswritinginthesame denitionoffpiaaaaaajqpiaabbr.wewillshowlaterhowthissimpleapproachto scouldexecuteafterfpiaaaaaajs,wecanensurethatfpiaaaaaajsisthereaching theirvalueisalwaysoverwrittenbyfpiaaaaaajs.1noticingthatnootherinstanceof arrayelement,suchasfpiaajsandfpiaaaajs,cannotreachthereadinstance,since computingreachingdenitionscanbegeneralized. 1FPIAAaAaAJsisthencalledanancestorofFPIAAaAaAJQPIAABBr,tobeformallydenedlater.
126 4.1.MOTIVATINGEXAMPLES SecondExample:ProcedureBST 125 LetusnowlookatprocedureBST,asshowninFigure4.2.Thisprocedureswapsnode childofthenodepointedbyp;p->valuedenotestheintegervalueofthenode. valuestoconvertabinarytreeintoabinarysearchtree(bst).nodesofthetreestructure arereferencedbypointers;p->l(resp.p->r)denotesthepointertotheleft(resp.right) PvoidBST(tree*p){ I1... LI2 a if(p->l!=null){ if(p->value<p->l->value){ BST(p->l); cb t=p->value; }} p->l->value=t; p->value=p->l->value; RJ1 if(p->r!=null){ LP J2 ed if(p->value>p->r->value){ BST(p->r); f } p->r->value=t; t=p->value; } p->value=p->r->value; I1 I1 PFPJ1RP I2 J1 F}intmain(){ aab I2 b cc ddj2 J2 eeff...figure4.2.procedurebstandcompressedcontrolautomaton... } if(root!=null)bst(root); andbetweenthereadaccessinubandinstanceuc.itisthesameforaninstancevof areanti-dependencesbetweentherstreadaccessinuandinstanceub,betweenthe secondreadaccessinuanduc,betweenthereadaccessininstanceuaandinstanceub, TherearefewdependencesonprogramBST.IfuisaninstanceofblockI2,thenthere blockj2:thereisananti-dependencebetweentherstreadaccessinuandue,between?istheuniquereachingdenitonofeachreadaccess. accessinueanduf.nootherdependencesarefound.wewillshowinthefollowinghow thereadaccessinuanduf,betweenthereadaccessinudandue,andbetweentheread tocomputethisresultautomatically.eventually,areachingdenitionanalysistellsthat OurlastmotivatingexampleisfunctionCount,asshowninFigure4.3.Itoperateson theinodestructurepresentedinsection2.3.3.thisfunctioncomputesthesizeofale ThirdExample:FunctionCount inblocks,incountingterminalinodes. thecountprogram(notconsideringtheotherdatastructures,suchasscalarc).how- Sincethereisnowriteaccesstotheinodestructure,therearenodependenceson
127 126 PintCount(inode*p){... CHAPTER4.INSTANCEWISEANALYSISFORRECURSIVEPROGRAMS beai else{ if(p->terminal) L=L=l c=0; for(inti=0;i<p->length;i++) returnp->length; cd }intmain(){ } returnc; c+=count(p->n[i]); PFP IaaI bbll EE L dd F} Count(file); cp ll ever,aninterestingresultforcacheoptimizationtechniques[td95]wouldbethateach...figure4.3.procedurecountandcompressedcontrolautomaton... automaticallybyouranalysistechniques. memorylocationisreadonlyonce.wewillshowthatthisinformationcanbecomputed Intherestofthischapter,weformalizetheconceptsintroducedabove.InSection4.2, wecomputemapsfrominstancenamestodata-elementnames.then,thedependence WhatNext? 4.2 andreachingdenitionsrelationarecomputedinsection4.3. andareferenceinthestatement tomemorylocations.toabstracttheeectofevery InSection2.4,wedenedstoragemappingsfromaccesses i.e.pairsofarun-timeinstance MappingInstancestoMemoryLocations statementinstance,weneedtomakeexplicitthesefunctions.thisisdonethroughthe useofinductionvariables. thatinductionvariablesaredescribedbysystemsofrecurrenceequations,weprovea fundamentalresolutiontheoremforsuchsystems,andnallyweapplythistheoreminan algorithmtocomputestoragemappings. Afterafewdenitionsandadditionalrestrictionsoftheprogrammodel,weshow existingprogramvariable,and\v"isanabbreviationfor\thevalueofvariable\v". Tosimplifythenotationsofvariablesandvalues,wewrite\v"forthenameofan loops torecursiveprograms.tosimplifytheexposition,wesupposethateveryinteger Wenowextendtheclassicalconceptofinductionvariable stronglyconnectedwithnested InductionVariables distinctivename.thisallowsquickandnon-misleadingwordingssuchas\variablei", orpointervariablethatislocaltoaprocedureorglobaltotheprogramhasaunique
128 4.2.MAPPINGINSTANCESTOMEMORYLOCATIONS andhasnoeectonthegeneralityoftheapproach.comparedtoclassicalworkswith 127 nestsofloops[wol92],wehavearatheroriginaldenitionofinductionvariables: integerargumentsofafunctionthatareinitializedtoaconstantortoaninteger integerloopcountersthatareincremented(ordecremented)byaconstantateach ateachprocedurecall; inductionvariableplusconstant(e.g.incrementedordecrementedbyaconstant), pointerargumentsthatareinitializedtoaconstantortoapossiblydereferenced loopiteration; pointerloopvariablesthataredereferencedateachloopiteration; Forexample,supposei,jandkareintegervariables,pandqarepointervariables pointerinductionvariable,ateachprocedurecall; toaliststructurewithamembernextoftypelist*,andcomputeissomeprocedure withtwoarguments.inthecodeinfigure4.4,reference2*i+jappearsinanon-recursive functioncall,hencei,j,pandqareconsideredinductionvariables.ontheopposite,k isnotaninductionvariablebecauseitretainsitslastvalueattheentryoftheinnerloop.... voidcompute(inti,list*p){ list*q; intj,k; for(q=p,k=0;q!=null;q=q->next) for(j=0;j<100;j+=2,k++) //recursivecall } printf("%d",2*i+j); Compute(j+1,q); casesofdirectassignmentstoinductionvariablesareallowed i.e.inductionvariable...figure4.4.firstexampleofinductionvariables... updatesoutsideofloopiterationsandprocedurecalls.regardinginitializationandincrement/decrement/dereference,therulesarethesamethanforaprocedurecall,but Asakindofsyntacticsugartoincreasetheversatilityofinductionvariables,some directassignmentscanbeinterpretedas\executedattheentryofthatblock",thename therearetwoadditionalrestrictions.theserestrictionsarethoseofthecodemotion ofthestatementbeingreplacedbytheactualnameoftheblock. assignmenttosomeloop/procedureblocksurroundingit.aftersuchatransformation, [KRS94,Gup98]andsymbolicexecutiontechniques[Muc97]usedtomoveeachdirect convertedintoaforlooponi,butjisnotaninductionvariablesinceitisnotinitialized theprograminfigure4.5.a,iisaninductionvariablebecausethewhileloopcanbe intolegalinductionvariableupdates,asshownbythefollowingexamples.considering Ofcourse,symbolicexecutiontechniquescannotconvertallcasesofdirectassignations iisnotaninductionvariablebecausesisguardedbyaconditional. attheentryoftheinnerforloop.consideringtheotherprograminfigure4.5.b,variable
129 inti=0,j=0,k,a[200]; CHAPTER4.INSTANCEWISEANALYSISFORRECURSIVEPROGRAMS while(i<10){ for(k=0;k<10;k++){ r } A[i]=A[i]+A[j]; ; j=j+2; inti,a[10,10]; for(i=0,j=0;i<10;i++){ if() sfigure4.5.a.secondexample } i=i+1; sr } A[i,j]=; i=i+2;...figure4.5.moreexamplesofinductionvariables... Figure4.5.b.Thirdexample programmodelpresentedinsection2.2,ouranalysisrequiresafewadditionalhypotheses: AdditionalrestrictionstotheprogrammodelIncomparisonwiththegeneral everydatastructuresubjecttodependenceorreachingdenitionanalysismustbe everyarraysubscriptmustbeananefunctionofintegerinductionvariables(not allocationsandstacks); declaredglobal(noticethatlocalvariablescanbemadeglobalusingexplicitmemory everytreeaccessmustdereferenceapointerinductionvariable(notanypointer anyintegervariable)andsymbolicconstants; 4.2.2variable)oraconstant. mustbeabletoassociatememorylocationstomemoryreferencesinstatementinstances Describingconictsbetweenmemoryaccessesisatthecoreofdependenceanalysis.We BuildingRecurrenceEquationsonInductionVariables acontrolwordtotheassociatedvalueoftheinductionvariable.inaddition,thenext (i.e.a[i],*p,etc.)bymeansofstoragemappings.thisanalysisisdoneindependently denitionintroducesanotationfortherelationbetweencontrolwordsandinduction oneachdata-structure.foreachinductionveriable,wethusneedafunctionmapping Denition4.1(valueofinductionvariables)Letbeaprogramstatementor variablevalues. block,andwbeaninstanceof.thevalueofvariableiatinstancewisdened asthevalueofiimmediatelyafterexecuting(resp.entering)instancewofstatement (resp.block).thisvalueisdenotedby[i](w). Weconsiderpairsofelementsinmonoids,andtobeconsistentwiththeusualnotation allpairs(u;i)suchthat[i](u)=i,forallinstancesuof. Foraprogramstatementandaninductionvariablei,wecall[i;]thesetof forrationalsetsandrelations,apair(x;y)willbedenotedby(xjy). Indeed,anexecutiontracekeepsalltheinformationaboutvariableupdates,butnota Ingeneral,thevalueofavariableatagivencontrolworddependsontheexecution.
130 4.2.MAPPINGINSTANCESTOMEMORYLOCATIONS controlword.however,duetoourprogrammodelrestrictions,inductionvariablesare 129 completelydenedbycontrolwords: Lemma4.1Letibeaninductionvariableanduastatementinstance.Ifthevalue executesornot thenvisaprexofu. Proof:Simplyobservethatonlyloopentries,loopiterationsandprocedurecallsmay [i](u)dependsontheeectofaninstancev i.e.thevaluedependsonwhetherv modifyaninductionvariable,andthatloopentriesareassociatedwithinitialisations which\kill"theeectofallpreceedingiterations(associatedwithnon-prexcontrol mappingsfeandfe0coincidesonae\ae0.thisstrongpropertyallowstoextendthe words). Fortwoprogramexecutionse;e02E,theconsequenceofLemma4.1isthatstorage thusconsiderinthefollowingastoragemappingfindependentontheexecution. extension,allstoragemappingsfordierentexecutionsofaprogramcoincides.wewill computationofastoragemappingfetothewholesetaofpossibleaccesses.withthis tions: Lemma4.2Let(Mdata;)bethemonoidabstractionoftheconsidereddatastructure. Thefollowingresultstatesthatinductionvariablearedescribedbyrecurrenceequa- valueifiiscapturedbyoneofthefollowingequations: Considerastatementandaninductionvariablei.Theeectofstatementonthe whereinducisthesetofallinductionvariablesintheprogram,includingi. either92mdata;j2induc: or92mdata: 8u2Lctrl:[i](u)=[j](u) 8u2Lctrl:[i](u)= (4.1) (4.2) onlytwoways: tions,edgecorrespondstoastatementintheprogramtextthatcanmodifyiin Proof:Consideranedgeinthecontrolautomaton.Duetooursyntacticalrestric- eitherthereexistaninductionvariablejwhosevalueisj2mdatajustbefore valueofiafterexecutinginstanceuisj translationfromapossiblyidentical variable; executinginstanceuofstatementandaconstant2mdatasuchthatthe orthereexistaconstant2mdatasuchthatthevalueofiafterexecutinginstance uis initialization. inductionvariables.thereforewealsobuildequationsonanefunctionsa(i,j,) Noticethat,whenaccessingarrays,weallowgeneralanesubscriptsandnotonly equationson[2i+j k](u)knowingthat[2i+j k](u)=2[i](u)+[j](u) oftheinductionvariables.forexample,ifa(i,j,k)=2*i+j-kthenwehavetobuild and[k](u)isnotpossibleingeneral:variablesi,jandkmayhavedierentscopes. [k](u).2 2Wehaveindeedtogeneratenewequations,sincecomputing[2i+j k](u)from[i](u),[j](u)
131 130Tobuildsystemsofrecurrentequationsautomatically,weneedtwoadditionalnota- tions: Undefinedisapolymorphicvalueforinductionvariables,[i](w)=Undefinedmeans CHAPTER4.INSTANCEWISEANALYSISFORRECURSIVEPROGRAMS Arg(proc;num)standsforthenumthactualargumentofprocedureproc. isnotvisibleatinstancew; thatvariableihasanundenedvalueatinstancew;itmayalsobethecasethati program. Recurrence-Build(program) AlgorithmRecurrence-BuildappliesLemma4.2inturnforeachstatementinthe 1sys program:anintermediaterepresentationoftheprogram returnsalistofrecurrenceequations 43doforeachinductionvariableiin 2foreachstatementinprogram doswitch? 7658 case=for(;;i=i+inc)://loopiteration case=for(i=init;;)://loopentry sys[f8u2lctrl:[i](u)=initg case=for(;;i=i->inc)://loopiteration sys 12 case=proc( sys[f8u2lctrl:[i](u)=[i](u)incg 13 case=proc( sys sys[f8u2lctrl:[arg(proc;m)](u)=[var](u)g {z} {z} m 1,var,): case=proc( sys sys[f8u2lctrl:[arg(proc;m)](u)=[var](u)cstg {z} m 1,var+cst,): case=proc( sys sys[f8u2lctrl:[arg(proc;m)](u)=[var](u)cstg {z} m 1,var->cst,): casedefault: sys[f8u2lctrl:[arg(proc;m)](u)=cstg m 1,cst,): doform foreachprocedurepdeclaredproc(type1arg1,,typenargn)in dosys sys 1ton sys[f8up2lctrl:[argm](up)=[arg(proc;m)](u)g sys[f8u2lctrl:[i](u)=[i](u)g 24returnsys Transposedto[i;] thesetofallpairs(uj[i](u)) itsaysthat constant2mdatasuchthat[i](u)=[j](u)isanequationgeneratedbylemma4.2. Now,supposethatthereexistastatement,twoinductionvariablesiandj,anda forallstatements0thatmayprecedeinavalidcontrolwordu.second,supposethat thereexistastatement,aninductionvariablesi,andaconstant2mdatasuchthat (ujj)2[j;0]=)(ujj)2[i;];
132 4.2.MAPPINGINSTANCESTOMEMORYLOCATIONS [i](u)=isanequationgeneratedbylemma4.2.transposedto[i;],itsaysthat 131 forallstatements0thatmayprecedeinavalidcontrolwordu.thesetwoobservationsallowtobuildanewsysteminvolvingequationsonsets[i;]fromtheresultof (uji)2[i;0]=)(uj)2[i;]; renceequationsoftheform[i]()=[j](")([j](")isanundenedvalue)or[i]()=, Recurrence-Build.AlgorithmtoachievethisiscalledRecurrence-Rewrite:the andthetwoloopson0considerpredecessorsof. twoconditionalsinrecurrence-rewriteareassociatedwithu=",i.e.withrecur- Recurrence-Rewrite(program;system) 1Lctrl returnsarewrittensystemofrecurrenceequations system:asystemofrecurrenceequationsproducedbyrecurrence-build program:anintermediaterepresentationoftheprogram 4doif2Lctrl 2new 3foreachequation8u2Lctrl:[i](u)=[j](u)insystem? languageofcontrolwordsofprogram 5678foreachequation8u2Lctrl:[i](u)=insystem foreach0suchthat(ctrl0\lctrl)6=? donew thennew new[f8u2lctrl:(ujj)2[j;0])(ujj)2[i;]g new[f(jj)2[i;]g doif2Lctrl 13returnnew 12 foreach0suchthat(ctrl0\lctrl)6=? donew thennew new[f8u2lctrl:(uji)2[i;0])(uj)2[i;]g new[f(j)2[i;]g procedurequeens.therearethreeinductionvariables,i,jandk;butvariableiisnot usefulforcomputingstoragemappingfunctions.wegetthefollowingequations: AlgorithmsRecurrence-BuildandRecurrence-Rewritearenowappliedto FromentryBofloopB=B=b:8uB2Lctrl:[j](uB)=0 FromrecursivecallQ:8uQ2Lctrl:[Arg(Queens;2)](uQ)=[k](u)+1 FromprocedureP:8uP2Lctrl:[k](uP)=[Arg(Queens;2)](u) FrommaincallF:[Arg(Queens;2)](F)=0 Allotherstatementsletinductionvariablesunchangedorundened: FromiterationbofloopB=B=b:8ub2Lctrl:[j](ub)=[j](u)+1 8uP2Lctrl:[j](uP)=Undefined 8uI2Lctrl:[j](uI)=Undefined [j](f)=undefined 8uA2Lctrl:[j](uA)=Undefined 8ua2Lctrl:[j](ua)=Undefined 8uQ2Lctrl:[j](uQ)=Undefined 8uJ2Lctrl:[j](uJ)=[j](u) 8uB2Lctrl:[j](uB)=[j](u) 8ur2Lctrl:[j](ur)=[j](u) 8us2Lctrl:[j](us)=Undefined
133 132 CHAPTER4.INSTANCEWISEANALYSISFORRECURSIVEPROGRAMS 8uA2Lctrl:[k](uA)=[k](u) 8uI2Lctrl:[k](uI)=[k](u) [k](f)=undefined 8ua2Lctrl:[k](ua)=[k](u) 8uB2Lctrl:[k](uB)=[k](u) 8uQ2Lctrl:[k](uQ)=[k](u) 8uJ2Lctrl:[k](uJ)=[k](u) 8ur2Lctrl:[k](ur)=[k](u) 8ub2Lctrl:[k](ub)=[k](u) [j](u)=j(resp.[k](u)=k),forallinstancesuofastatement.fromequations Now,recallthat[j;](resp.[k;])isthesetofallpairs(ujj)(resp.(ujk))suchthat 8us2Lctrl:[k](us)=[k](u) above,recurrence-rewriteyields: 8><>: 8uP2Lctrl:(ujj)2[j;Q])(uPjUndefined)2[j;P] 8uP2Lctrl:(ujj)2[j;F])(uPjUndefined)2[j;P] 8uI2Lctrl:(ujj)2[j;P])(uIjUndefined)2[j;I] (FjUndefined)2[j;F] 8uA2Lctrl:(ujj)2[j;A])(uAjUndefined)2[j;A] 8uA2Lctrl:(ujj)2[j;I])(uAjUndefined)2[j;A] 8uB2Lctrl:(ujj)2[j;A])(uBj0)2[j;B] 8uB2Lctrl:(ujj)2[j;B])(uBjj)2[j;B] 8ua2Lctrl:(ujj)2[j;A])(uajUndefined)2[j;a] 8uA2Lctrl:(ujj)2[j;a])(uAjUndefined)2[j;A] 8uJ2Lctrl:(ujj)2[j;A])(uJjUndefined)2[j;J] 8ur2Lctrl:(ujj)2[j;B])(urjj)2[j;r] 8ub2Lctrl:(ujj)2[j;B])(ubjj+1)2[j;b] 8uB2Lctrl:(ujj)2[j;b])(uBjj)2[j;B] 8uQ2Lctrl:(ujj)2[j;J])(uQjUndefined)2[j;Q] 8us2Lctrl:(ujj)2[j;J])(usjUndefined)2[j;s]
134 4.2.MAPPINGINSTANCESTOMEMORYLOCATIONS 8><>: (FjUndefined)2[k;F] 133 8uP2Lctrl:(ujx)2[Arg(Queens;2);F])(uPjx)2[k;P] 8uP2Lctrl:(ujx)2[Arg(Queens;2);Q])(uPjx)2[k;P] 8uI2Lctrl:(ujk)2[k;P])(uIjk)2[k;I] 8uA2Lctrl:(ujk)2[k;A])(uAjk)2[k;A] 8uA2Lctrl:(ujk)2[k;I])(uAjk)2[k;A] 8uB2Lctrl:(ujk)2[k;A])(uBjk)2[k;B] 8uB2Lctrl:(ujk)2[k;B])(uBjk)2[k;B] 8uA2Lctrl:(ujk)2[k;a])(uAjk)2[k;A] 8ua2Lctrl:(ujk)2[k;A])(uajk)2[k;a] 8uB2Lctrl:(ujk)2[k;b])(uBjk)2[k;B] 8uQ2Lctrl:(ujk)2[k;J])(uQjk)2[k;Q] 8uJ2Lctrl:(ujk)2[k;A])(uJjk)2[k;J] 8ur2Lctrl:(ujk)2[k;B])(urjk)2[k;r] 8ub2Lctrl:(ujk)2[k;B])(ubjk)2[k;b] 8us2Lctrl:(ujk)2[k;J])(usjk)2[k;s] SolvingRecurrenceEquationsonInductionVariables 8uQ2Lctrl:(ujk)2[k;J])(uQjk+1)2[Arg(Queens;2);Q] (Fj0)2[Arg(Queens;2);F] Thefollowingresultisatthecoreofouranalysistechnique,butitisnotlimitedtothis purpose.itwillbeappliedinthenextsectiontothesystemofequationsreturnedby Recurrence-Rewrite. Lemma4.3ConsidertwomonoidsLandMwithrespectivebinaryoperationsand?. LetRbeasubsetofLMdenedbyasystemofequationsoftheform (E1) (E2) 8l2L;m12M: 8l2L;m22M:(ljm2)2R2=)(l2j2)2R; (ljm1)2r1=)(l1jm1?1)2r wherer1lmandr2lmaresomesetvariablesconstrainedinthesystem (possiblyequaltor),1;2areconstantsinland1;2areconstantsinm.then, andm,intoexpressionsinthemonoidlm.thenoursecondtaskistoderiveset Risarationalset. Proof:OurrsttaskistoconverttheseexpressionsonunstructuredelementsofL expressionsinlm,oftheformsetconstantsetorconstantset(theinduced operationisdenotedby\").indeed,theright-hand-sideof(e1)canbewritten Thus,(E1)gives (ljm1)(1j1)2r: Theright-hand-sideof(E2)canalsobeenwritten (lj")(2j2)2r R1(1j1)R: but(lj")isneitheravariablenoraconstantoflm.
135 134Toovercomethisdiculty,wecallR"thesetofallpairs(lj")suchthat9m2M: CHAPTER4.INSTANCEWISEANALYSISFORRECURSIVEPROGRAMS (ljm)2r.itisclearthatr"satisesthesameequationsasrwithallrightpair Atlast,iftheonlyequationsonRare(E1)and(E2),wehave membersreplacedby".now,(e2)yieldstwoequations: R"2(2j")R" R"=R1(1j")+R"2(2j") and R"("j2)R: Moregenerally,applyingthisprocesstoR1,R2andtoeverysubsetofLMdescribed inthesystem,wegetanewsystemofregularequationsdeningr.itiswellknown R=R1(1j1)+R"("j2) codedaslistsofequations),andtostringoperationconcat(equationsareencoded ThankstoclassicallistoperationsInsert,DeleteandMember(systemsareen- thatsuchequationsdenearationalsubsetoflm. asstrings),algorithmrecurrence-solvegivesanautomaticwaytosolvesystemsof equationsoftheform(e1)or(e2). Recurrence-Solve(system) 1sets returnsalistofregularexpressions system:alistofrecurrenceequationsoftheform(e1)and(e2) 3doInsert(sets;fA(j)Bg) 2foreachimplication\(ljm)2A)(ljm?)2B"insystem 4 Insert(sets;fA"(j")B"g)? 75foreachimplication\(ljm)2A)(lj)2B"insystem 6doInsert(sets;fB"("j)Bg) 10doifMember(variables;B) 8variables 9foreachinclusion\A(xjy)B"insets Insert(sets;fA"(j")B"g) 11 thenequation? 14variables elseinsert(variables;\b=a(xjy)") Insert(variables;Concat(equation;\+A(xjy)")) Compute-Regular-Expressions(variables) Delete(variables;B) 15returnvariables whenthevariableinleft-handsidedoesnotappearinright-handside orkleenestar systemisseenasaregulargrammarandresolutionisdonethroughvariablesubstitution betweenrationalsets,thenreturnsalistofregularexpressionsdeningthesesets.the AlgorithmCompute-Regular-Expressionssolvesasystemofregularequations see[hu79]fordetails. insertion whenitdoes.wellknownheuristicsareusedtoreducethesizeoftheresult, tocomputethevalueofinductionvariablesatcontrolwords. Themainresultofthissectionfollows:wecansolverecurrenceequationsinLemma ComputingStorageMappings Theorem4.1ThestoragemappingfthatmapseverypossibleaccessinAtothememorylocationitaccessesisarationalfunctionfromctrltoMdata.
136 4.2.MAPPINGINSTANCESTOMEMORYLOCATIONS Proof:Sincearraysubscriptsareanefunctionsofintegerinductionvariables,and 135 sincetreeaccessesaregivenbydereferencedinductionpointers,onemaygeneratea systemofequationsaccordingtolemma4.2(orrecurrence-build)foranyread Rewrite,thissystemisrewrittenintermsofequationsonsetsofpairs(uj[i](u)), orwriteaccess. Theresultisasystemofequationsoninductionvariables.ThankstoRecurrence- whereuisacontrolwordandiisaniterationvariable,describingthevalueof iforanyinstanceofstatement.wethusgetanewsystemwhichinductively describessubset[i;]ofctrlmdata.becausethissystemsatisesthehypotheses oflemma4.3,wehaveproventhat[i;]isarationalsetofctrlmdata.now,for of buildarationalset.hencefisarationaltransductionfromctrltomdata. Becausefisalsoapartialfunction,itisarationalfunctionfromctrltoMdata. agivenmemoryreferencein,weknowthatpairs(wjf(w)) wherewisaninstance Theproofisconstructive,thankstoRecurrence-BuildandRecurrence-Solve, andcompute-storage-mappingsisthealgorithmtoautomaticallycomputestorage regularexpressions realizingtherationalstoragemappingsforeachreferenceinrighthandside. isalistofrationaltransducers convertedbycompute-rational-transducerfrom mappingsforarecursiveprogramsatisfyingthehypothesesofsection4.2.1.theresult Compute-Storage-Mappings(program) 1system returnsalistrationaltransducersrealizingstoragemappings program:anintermediaterepresentationoftheprogram 4newlist 3list 2new Recurrence-Solve(new) Recurrence-Rewrite(program;system) Recurrence-Build(program) 5foreachregularexpressionreginlist 6donewlist 7returnnewlist? LetusnowapplyCompute-Storage-MappingsonprogramQueens.Startingfrom newlist[compute-rational-transducer(reg) theresultofrecurrence-rewrite,weapplyrecurrence-solve.justbeforecallingcompute-regular-expressions,wegetthefollowingsystemofregularequations:
137 136 CHAPTER4.INSTANCEWISEANALYSISFORRECURSIVEPROGRAMS 8><>: [j;f]=(fjundefined) [j;p]=[j;f](pjundefined)+[j;q](pjundefined) [j;i]=[j;p](ijundefined) [j;a]=[j;i](ajundefined) [j;a]=[j;a](ajundefined)+[j;a](ajundefined) [j;a]=[j;a](ajundefined) [j;b]=[j;b]"("j0) [j;b]=[j;b](bj0)+[j;b](bj0) [j;b]=[j;b](bj1) [j;r]=[j;b](rj0) [j;j]=[j;a](jjundefined) [j;q]=[j;j](qjundefined) [j;s]=[j;j](sjundefined) [j;f]"=(fj") [j;p]"=[j;f]"(pj0)+[j;q]"(pj0) [j;i]"=[j;p]"(ij0) [j;a]"=[j;i]"(aj0) [j;a]"=[j;a]"(aj0)+[j;a]"(aj0) [j;a]"=[j;a]"(aj0) [j;b]"=[j;a]"(bj0) [j;b]"=[j;b]"(bj0)+[j;b]"(bj0) [j;b]"=[j;b]"(bj0) [j;j]"=[j;a]"(jj0) [j;q]"=[j;j]"(qj0) 8><>: [k;f]=(fjundefined) [k;p]=[arg(queens;2);f](pj0)+[arg(queens;2);q](pj0) [k;i]=[k;p](ij0) [k;a]=[k;i](aj0) [k;a]=[k;a](aj0)+[k;a](aj0) [k;a]=[k;a](aj0) [k;b]=[k;a](bj0) [k;b]=[k;b](bj0)+[k;b](bj0) [k;b]=[k;b](bj0) [k;r]=[k;b](rj0) [k;j]=[k;a](jj0) [k;q]=[k;j](qj0) [k;s]=[k;j](sj0) [Arg(Queens;2);F]=(Fj0) [Arg(Queens;2);Q]=[k;J](Qj1) Thesesystems seenasregulargrammars canbesolvedwithcompute-regular- Expressions,yieldingregularexpressions.Theseexpressionsdescriberationalfunctions fromctrltoz,butweareonlyinterestedin[j;r]and[k;s](accessestoarraya): [j;r]=(fpiaaj0) (JQPIAAj0)+(aAj0)(BBj0)(bBj1)(rj0) (4.3) [k;s]=(fpiaaj0) (JQPIAAj1)+(aAj0)(Jsj0) (4.4)
138 4.2.MAPPINGINSTANCESTOMEMORYLOCATIONS Eventually,wehavefoundthestoragemappingfunctionforeveryreferencetothearray: (urjf(ur;a[j])) 137 (usjf(us;a[k])) =(FPIAAj0) (JQPIAAj0)+(aAj0)(BBj0)(bBj1)(rj0)(4.5) ApplicationtoMotivatingExamples =(FPIAAj0) (JQPIAAj1)+(aAj0)(Jsj0) (4.6) repeattheprocessforthetwoothermotivatingexamples. ProcedureBST WehavealreadyappliedCompute-Storage-MappingsonprogramQueens,andwe AlgorithmCompute-Storage-MappingsisnowappliedtoprocedureBSTinFigure4.2.Theonlyinductionvariableisp: FromsecondrecursivecallR:8uR2Lctrl:[Arg(BST;1)](uR)=[p](u)r: FromrstrecursivecallL:8uL2Lctrl:[Arg(BST;1)](uL)=[p](u)l FromprocedureBST:8uP2Lctrl:[k](uP)=[Arg(BST;1)](u) FrommaincallF:[Arg(BST;1)](F)=" equationsabove,thissetsatisfythefollowingregularequations: setofallpairs(ujp)suchthat[p](u)=p,forallinstancesuofastatement.from Allotherstatementslettheinductionvariableunchanged.Recallthat[p;]isthe 8><>: [p;p]=(fpj")+[p;i1](lpjl)+[p;j1](rpjr) [p;i1]=[p;p](i1j") [p;j1]=[p;p](j1j") [p;i2]=[p;i1](i2j") [p;j2]=[p;j1](j2j") [p;a] [p;b] [p;c] [p;d] =[p;i2](bj") =[p;i2](cj") =[p;i2](aj") ThissystemdescribesrationalfunctionsfromctrltoZ,butweareonlyinterestedin [p;e] [p;f]=[p;j2](fj") =[p;j2](dj") [p;]for2fi2;a;b;c;j2;d;e;fg(accessestonodevalues): =[p;j2](ej") Eventually,wecancomputethestoragemappingfunctionforeveryreferencetothetree: 82fJ2;d;e;fg:[p;]=(FPj") (I1LPjl)+(J1RPjr)(J1J2j") 82fI2;a;b;cg:[p;]=(FPj") (I1LPjl)+(J1RPjr)(I1I2j") (4.7) (ujf(u;p->value)) 82fI2;a;bg: (4.8) (ujf(u;p->l->value)) 82fI2;b;cg: =(FPj") (I1LPjl)+(J1RPjr)(I1I2j") =(FPj") (I1LPjl)+(J1RPjr)(I1I2jl) (4.10) (4.9) (ujf(u;p->r->value)) (ujf(u;p->value)) 82fJ2;d;eg: 82fJ2;e;fg: =(FPj") (I1LPjl)+(J1RPjr)(J1J2j")(4.11) =(FPj") (I1LPjl)+(J1RPjr)(J1J2jr)(4.12)
139 138 FunctionCount CHAPTER4.INSTANCEWISEANALYSISFORRECURSIVEPROGRAMS AlgorithmCompute-Storage-MappingsisnowappliedtoprocedureCountinFigure4.3.Variablepisatreeindexandvariableiisanintegerindex.Indeed,theinode Ldata,withbinaryoperationdenedinSection2.3.3.Butnosuchvariableappearsin Thus,theeectiveinductionvariableshouldcombinebothpandiandbeinterpretedin theprogram...thereasonisthatthecodeiswritteninc,inwhichtheinodestructure cannotbereferencedthroughauniform\cursor" likeatreepointerorarraysubscript. structureisneitheratreenoranarray:nodesarenamedinthelanguageldata=(zn)z. IPintCount(inode&p){... bea else{ if(p->terminal) L=L=l c=0; for(inti=0,inode&q=p->n;i<p->length;i++,q=q->1) returnp->length; cd }main(){ } returnc; c+=count(q); F...Figure4.6.ProcedureCountandcontrolautomaton... } Count(file); twoinductionvariables: operation->hasbeenredenedtoemulatearrayaccesses.3referencespandqarethe inac++-likesyntaxinfigure4.6.now,pisac++referenceandnotapointer,and Thiswouldbecomepossibleinahigher-levellanguage:wehaverewrittentheprogram FromentryLofloopL=L=l:8uL2Lctrl:[q](uL)=[p](u)n Fromrecursivecallc:8uc2Lctrl:[Arg(Count;1)](uc)=[q](u) FromprocedureP:8uP2Lctrl:[p](uP)=[Arg(Count;1)](u) FrommaincallF:[Arg(Count;1)](F)=" [p;](resp.[q;])isthesetofallpairs(ujp)(resp.(ujq))suchthat[p](u)=p(resp. Allotherstatementsletinductionvariablesunchangedorundened.Recallthat FromiterationlofloopL=L=l:8ul2Lctrl:[q](ul)=[q](u)1 [q](u)=q),forallinstancesuofastatement.fromequationsabove,thesesetssatisfy 3Yes,C++isbothhigh-levelanddirty!
140 4.3.DEPENDENCEANDREACHINGDEFINITIONANALYSIS thefollowingregularequations: 139 8><>: [p;p]=(fpj")+[q;l](cpj") [p;e]=[p;p](ej") [p;i]=[p;p](ij") [p;a]=[p;i](aj") [p;b]=[p;e](bj") [p;l]=[p;e](lj") [p;l]=[p;l](lj")+[p;l](llj") [p;d]=[p;e](dj") [q;p]=(fjundefined)+[q;l](cpjundefined) [q;a]=[q;i](ajundefined) [q;e]=[q;p](ejundefined) [q;i]=[q;p](ijundefined) [q;b]=[q;e](bjundefined) [q;l]=[p;e](ljn) [q;l]=[q;l](lj0)+[q;l](llj1) interestedin[p;i],[p;a]and[p;l](accessestoinodevalues): Thesesystemsdescriberationalfunctionsfromctrlto(Zn)Z,butweareonly [q;d]=[q;e](djundefined) [p;a]=(uajf(ua;p->length)) [p;i]=(uijf(ui;p->terminal)) =(FPj") (ELLjn)(lLj1)(cPj")(Ij") =(FPj") (ELLjn)(lLj1)(cPj")(Iaj") (4.13) [p;l]=(ulljf(ull;p->length)) =(Fj") (ELLjn)(lLj1)(cPj")(ELj") (4.14) 4.3 DependenceandReachingDenitionAnalysis (4.15) thatstoragemappingsarerationaltransductions.basedonthisresult,wewillnowpresent Whenallprogrammodelrestrictionsaresatised,wehaveshownintheprevioussection denitionsanddetailscanbefoundinchapter3. Bothclassicalresultsandrecentcontributionstoformallanguagestheorywillbeuseful, ageneraldependenceandreachingdenitionanalysisschemeforrecursiveprograms. arrays)fortechnicalquestionsdependingonthedatastructurecontext. inourprogrammodel.seesections4.4(trees),4.5(arrays)and4.6(nestedtreesand Thissectiontacklesthegeneraldependenceandreachingdenitionanalysisproblem problemsarisingwhencomputingdependencerelations.wethuspresentageneralcomputationschemefortheconictrelation,buttechnicalissuesandprecisestudyisleftfor InSection2.4.1,wehaveseenthatanalysisofconictingaccessesisoneoftherst BuildingtheConictTransducer therationallanguageofcontrolwords.letmdatabethemonoidabstractionforagiven thenextsections. Weconsideraprogramwhosesetofstatementlabelsisctrl.LetLctrlctrlbe
141 140 datastructuredusedintheprogram,andldatamdatabetherationallanguageofvalid CHAPTER4.INSTANCEWISEANALYSISFORRECURSIVEPROGRAMS datastructureelements. conictrelationeisdenedby Nowbecausefisusedinsteadoffe(itisindependentontheexecution),theexact whichisequivalentto 8e2E;8u;v2Lctrl:uev()(u;v2Ae)^f(u)=f(v); BecausefisarationaltransductionfromctrltoMdata,f 1isarationaltransduction frommdatatoctrl,andmdataiseitherafreemonoid,orafreecommutativemonoid, 8e2E;8u;v2Lctrl:uev()(u;v2Ae)^v2f 1(f(u)): one-countertransductions. orafreepartiallycommutativemonoid,weknowfromtheorems3.5,3.27and3.28that f 1fiseitherarationaloramulti-countercountertransduction.Theresultwillthus beexactinalmostallcases:onlymulti-countertransductionsmustbeapproximatedby andtheonlyapproximationofaewecanuseisthefulllanguagea=lctrlofcontrol words.eventually,theapproximateconictrelationwecomputeisthefollowing: over,guardsofconditionalsandloopboundsarenottakenintoaccountforthemoment, Wecannotcomputeanexactrelatione,sinceAedependsontheexecutione.More- Inallcases,wegetatransducerrealization(rationalorone-counter)oftransduction. Thisrealizationisoftenunapproximateonpairsofcontrolwordswhichareeectively 8u;v2Lctrl:uvdef ()v2f 1(f(u)): (4.16) executed. whethertwopointersarealiased[deu94,ste96],andemptimessisdecidableforrational analysis,consideringthefactthatisoftenunapproximateinpractice. andalgebraictransductions(seechapter3).thisisanimportantapplicationofour Onemayimmediatelynoticethattestingforemptinessofisequivalenttotesting computerelationusingthesametechniques.however,ageneralapproximationscheme forfunctionfhasnotbeendesigned,andfurtherstudyisleftforfuturework. functions:ifarationaltransductionapproximationoffwasavailable,onecouldstill Noticealsothatthiscomputationofdoesnotrequireaccessfunctionstoberational Tobuildthedependencetransducer,weneedrsttorestrictrelationetopairsofwrite accessesorreadandwriteaccesses,andthentointersecttheresultwiththelexicographic BuildingtheDependenceTransducer order<lex: 8e2E;8u;v2Lctrl: approximationofe.relationisrealizedbyarationaltransducerinthecaseoftrees ThankstotechniquesdescribedinSection3.6.2,wecanalwayscomputeaconservative uev()u e\((ww)[(wr)[(rw))\<lexv: andbyaone-countertransducerinthecaseofarraysornestedtreesandarrays. theintersectionitself.theintersectionmayindeedbeapproximateinthecaseoftrees andnestedtreesandarrays,becauserationalrelationsarenotclosedunderintersection Approximationsmayeithercomefromthepreviousapproximationofeorfrom
142 4.3.DEPENDENCEANDREACHINGDEFINITIONANALYSIS (seesection3.3).butthankstoproposition3.13itwillalwaysbeexactforarrays.more 141 stepsmaybeimplementeddierently. giveageneraldependenceanalysisalgorithmforourprogrammodel.thedependence- Analysisalgorithmisexactlythesameforeverykindofdatastructure,butindividual detailsineachdatastructurecasecanbefoundinsections4.4,4.5and4.6.wecannow Dependence-Analysis(program) 1f returnsadependencerelationbetweenallaccesses program:anintermediaterepresentationoftheprogram 3ifisamulti-countertransduction 2 4 (f 1f) Compute-Storage-Mappings(program) 7 5iftheunderlyingrationaltransducerofisnotleft-synchronous 6 then one-counterapproximationof 9return 8 \<lex \((WW)[(WR)[(RW)) resynchronizationwithorwithoutapproximationof theunionforallthedatastructuresinvolved. structure.togetthefulldependencerelationoftheprogram,itisnecessarytocompute TheresultofDependence-Analysisislimitedtodependencesonaspecicdata RemembertheformaldenitioninSection2.4.2:theexactreachingdenitionrelationis denedasalexicographicselectionofthelastwriteaccessindependencewithagiven FromDependencestoReachingDenitions readaccess,i.e. Inthecaseofanexactknowledgeofe,andwhenthisrelationisleft-synchronous,one Clearly,thismaximumisuniqueforeachreadaccessuinthecourseofexecution. 8e2E;8u2Re: e(u)=max <lexfv2we:veug: seesection mayeasilycomputeanexactreachingdenitionrelation,usinglexicographicselection, boundshavenotbeentakenintoaccount:theresultisthatmanynon-existingaccesses rarelyapplicable.moreover,usingthecomputationschemeabove,conditionalsandloop areconsidereddependentforrelation.weshouldthusbelookingforaconservative Theproblemisthateisnotknownpreciselyingeneral,andtheabovesolutionis writevmaybeindependencewithuwithoutbeingexecutedbytheprogram,andsecond,allwriteswhicharenoteectivelyinconictwithumaybeconsideredaspossible onmakescomputationoffrom(4.17)almostimpossible,fortworeasons:rst,a approximationofe,builtontheavailableapproximatedependencerelation.relying dependences. whenatleastoneofthefollowingconditionsissatised. Supposewecanprovethatsomestatementinstancedoesnotexecute,andthatthis However,weknowwecancomputeanapproximatereachingdenitionrelationfrom informationcanbeinsertedintheoriginaltransduction:someowdependencescan beremoved.theremaininginstancesaredescribedbypredicateemay(w)(instances thatmayexecute).
143 142Ontheopposite,ifwecanprovethatsomeinstancewdoesexecute,andifthis CHAPTER4.INSTANCEWISEANALYSISFORRECURSIVEPROGRAMS areeectivelyexecutedaredescribedbypredicateemust(w)(instancesthatmust ware\killed":theycannotreachaninstanceusuchthatwu.instancesthat execute). informationcanbeinsertedintheoriginaltransduction,thenwritesexecutingbefore Eventually,onemayhavesomeinformationeconditional(v;w)aboutaninstancesw Themoreprecisethepredicatesemay,emustandeconditional,themoreprecisethereaching isusedthesamewayastheformerpredicateemust. thatdoesexecutewheneveranotherinstancevdoes:this\conditional"information tionalstatementsandloopbounds.thisinformationisofcoursecriticalwhentryingto denitionrelation.insomecases,onemayevencomputeanexactreachingdenition buildpredicatesemay,emustandeconditional.retrievingthisinformationcanbedoneusing boththeresultsofinductionvariableanalysis(seesection4.2)andadditionalanalyses Now,rememberallourworksinceSection4.2hascompletelyignoredguardsincondi- ofthevalueofvariables[ch78,mas93,mp94,tp95].suchexternalanalyseswouldfor examplecomputeloopandrecursioninvariants. simplestructuralanalysisoftheprogram,whichconsistsinexploitingeveryinformation hiddenintheprogramsyntax: Anothersourceofinformation mostlyforpredicateeconditional isprovidedbya inawhileconstruct,assumingsomeinstanceofastatementdoesexecute,all inaifthenelseconstruct,eitherthethenortheelsebranchis instancesprecedingitinthewhileloopalsoexecute; executed; Noticethiskindofstructuralanalysiswasalreadycriticalfornestedloops[BCF97,Bar98, inasequenceofnon-guardedstatements,allinstancesofthesestatementsaresimultaneouslyexecutedornot; Won95]. Denition4.2(ancestor)ConsideranalphabetctrlofstatementlabelsandalanguageLctrlofcontrolwords.Wedeneunco:asubsetofctrlmadeofallblock denition: Anotherveryimportantstructuralpropertyisdescribedwiththefollowingadditional wr2lctrl(aninstanceofr).ifv2unco(withoutlabelsofconditionalstatements) labelswhicharenotconditionalsorloopblocks,andall(unguarded)procedurecall labels,i.e.blockswhoseexecutionisunconditional. issuchthatuvs2lctrl,thenuvsiscalledanancestorofwr. Letrandsbetwostatementsinctrl,andletubeastrictprexofacontrolword Thisdenitionisbestunderstoodonacontroltree,suchastheoneinFigure4.1.b page124:blacksquarefpiaaaaaajsisanancestoroffpiaaaaaajqpiaabbr,butnot ThesetofancestorsofaninstanceuisdenotedbyAncestors(u). graysquaresfpiaaaajsandfpiaajs.now,observetheformalancestordenition: 1.executionofwrimpliesexecutionofu,becauseitisinthepathfromtherootof thecontroltreetonodewr;
144 4.3.DEPENDENCEANDREACHINGDEFINITIONANALYSIS 2.executionofuimpliesexecutionofuvs,becausevismadeofdeclarationblocks 143 Wethushavethefollowingresult: only,withoutconditionalstatements. Proposition4.1Ifaninstanceuexecutes,thenallancestorsofualsoexecute.This canbewrittenusingpredicatesemustandeconditional: Atlast,wecandeneaconservativeapproximationofthereachingdenitionrelation,builton,emay,emustandeconditional: 8u2Lctrl:emust(u)=)emust(Ancestors(u)): econditional(u;ancestors(u)); algebraicoperationsinvolvedin(4.17).when,inaddition,relationisleft-synchronous, Predicatesemay,emust,econditionalshoulddenerationalsets,inordertocomputethe :(4.17) futurework,andwewillonlyconsiderafew\rules"usefulinourpracticalexamples. matecomputationofwith(4.17). closureunderunion,intersection,complementation,andcomposition,allowsunaproxi- However,designingageneralcomputationframeworkforthesepredicatesisleftfor Insteadofbuildingautomataforpredicatesemay,emustandeconditionalthencomputing from(4.17),wepresentafewrewritingrulestorenethesetsofpossiblereaching PracticalApproximationofReachingDenitions denitions,startingfromaveryconservativeapproximationofthereachingdenition relation:therestrictionofdependencerelationtoowdependences(i.e.fromawriteto areadaccess).thistechniqueislessgeneralthansolving(4.17),butitavoidscomplex andapproximate algebraicoperations. tractedbyexternalanalyses,suchasanalysisofcontitionalexpressions,detectionofin- variants,orstructuralanalysis.inthesection4.5,wewilldemonstratepracticalusageof Applicabilityoftherewritingrulesisgovernedbythecompile-timeknowledgeex- theseruleswhenapplyingourreachingdenitionanalysisframeworktoprogramqueens. ofthissection. renesetsofpossiblereachingdenitionsamonginstancesofs.reningsetsofpossible reachingdenitionswhichareinstancesofseveralstatementswillbediscussedattheend Forthemoment,wechooseastatementswithawritereferencetomemory,andtryto ThevpaProperty(ValuesareProducedbyAncestors) in-depthexplorationswherevaluesareproducedbyancestors.thisbehaviorisalso Thispropertycomesfromthecommonobservationaboutrecursiveprogramsthat\values stronglyassessedbyscoperulesoflocalvariables. areproducedbyancestors".indeed,alotofsort,tree,orgraph-basedalgorithmsperform vpa()8e2e;u2re;v2we: v=e(u)=)v2ancestors(u):
145 144Sinceallpossiblereachingdenitionsareancestorsoftheuse,rulevpaconsists CHAPTER4.INSTANCEWISEANALYSISFORRECURSIVEPROGRAMS itsassociatedrewritingruleisnotgiven. inremovingalltransitionsproducingnon-ancestors.formally,alltransitions0js.t. 0<txtand06=sareremoved. TheokaProperty(OneKillingAncestor) Wemaydeneoneotherinterestingpropertyusefultoautomaticpropertychecking; killsallpreviouswritessinceitdoesexecutewhenudoes. Ifitcanbeproventhatatleastoneancestorvsofareaduisindependencewithu,it PropertyChecking oka()8u2r: (u)6=?=)(9v2ancestors(u):v2(u)): withusmayfollowed accordingtothelexicographicorder byanon-ancestorinstance ingforpropertyvpaisdicult,butwemayrelyonthefollowingresult:whenproperty okaholds,checkingvpaisequivalenttocheckingwhetheranancestorvsindependence Propertyokacanbediscoveredusinginvariantpropertiesoninductionvariables.Check- windependencewithus. relevantrewritingruleforeachone. Otherpropertiescanbeobtainedbymoreinvolvedanalyses:theproblemistonda ands2,itisnotpossibleingeneraltodecidewhetheronemay\kill"theotherwithout therewritingrules.designingruleswhichhandletheglobalowoftheprogramisabit moredicult.whencomparingpossiblereachingdenitioninstancesoftwowritess1 Now,rememberwerestrictedourselvestooneassignationstatementswhenpresenting problemisthustointersecttworationaloralgebraicrelations,whichcannotbedone aspecictransducer(rationalorone-counter,dependingonthedatastructure).the storagemappingsfors1ands2areverysimilar,andexactresultscanbeeasilycomputed. withoutapproximationsingeneral,seesections3.6and3.7.inmanycases,however, and3.7.theintersectionwithrfw:emay(w)ginthethirdlineservesthepurposeof denitionanalysisinsideourprogrammodel.algebraicoperationsonsetsandrelations inthesecondloopofthealgorithmmayyieldapproximativeresults,seesections3.4,3.6 TheReaching-Definition-Analysisalgorithmisageneralalgorithmforreaching Definition-AnalysisalgorithmisappliedtoprogramQueensinSection4.5. restrictingthedomaintoreadaccessesandtheimagetowriteswhichmayexecute;itcan becomputedexactlysincerfw:emay(w)gisarecognizablerelation.thereachingwhataboutthe?instance?whenpredicatesemust(v)oreconditional(u;v)areemptyfor spuriousowdependencesmayremainwhentheresultisapproximate. Now,thereissomethingmissinginthispresentationofreachingdenitionanalysis: Noticethatalloutputandanti-dependencesareremovedbythealgorithm,butsome allpossiblereachingdenitionsvofareadinstanceu,itmeansthatanuninitializedvalue neededintheresultofreaching-definition-analysis. isapossiblereachingdenitionornot.thisgivesanautomaticwaytoinsert?when true.intermsofour\practicalproperties",okacanbeusedtodeterminewhether? maybereadbyu,hencethat?isapossiblereachingdenition;andthereciprocalis thelimitsofrelyingonalistofrenementrulestocomputeanapproximatereaching instancewisedependenceanalysisofrecursiveprograms,butweshouldalsorecognize Toconcludethissection,wehaveshownaverycleanandpowerfulframeworkfor
146 Reaching-Definition-Analysis(program) 4.4.THECASEOFTREES program:anintermediaterepresentationoftheprogram computeemay;emustandeconditionalusingstructuralandexternalanalyses returnsareachingdenitionrelationbetweenallaccesses 4foreachassignmentstatementsinprogram 5docheckforpropertiesoka,vpa,andotherproperties \(Rfw:emay(w)g) Dependence-Analysis 69dokill 78foreachpairofassignmentstatements(s;t)inprogram applyrenementrulesonaccordingly usingexternalstaticanalysesoraskingtheuser 10 12return 11 kill f(us;w)2wr:(9vt2w:usw^vtw^us<lexvt ^(emust(vt)_econditional(us;vt)_econditional(w;vt)))g denitionrelationfromanapproximatedependencerelation.nowthatthefeasibility ofinstancewisereachingdenitionanalysisforrecursiveprogramshasbeenproven,itis timetoworkonaformalframeworktocomputepredicatesemay,emustandeconditional, fromwhichwecouldexpectapowerfulreachingdenitionanalysisalgorithm. 4.4 Wewillnowprecisethedependenceandreachingdenitionanalysisinthecaseofatree structure.practicalcomputationswillbeperformedonprogrambstpresentedin4.2. TheCaseofTrees freemonoids.computationoffunctionfforprogrambsthasalreadybeendoneinsection4.2.5.figure4.7showsarationaltransducerrealizingrationalfunctionf.following freemonoidmdata=fl;rgandthestoragemappingisarationaltransductionbetween storagemapping.whentheunderlyingdatastructureisatree,itsabstractionisthe TherstpartoftheDependence-Analysisalgorithmconsistsincomputingthe distinguishbetweendistinctreferencesini2,j2,bande,yieldingnewlabelsi2p,i2p->l, J2p,J2p->r,bp,bp->l,epandep->r(thesenewlabelsmayonlyappearasthelastletterina thelinesofsection2.3.1page68,thealphabetofstatementlabelshasbeenextendedto transduction.theresultforprogrambstisgivenbythetransducerinfigure4.8. controlword). ComputationofisdonethankstoElgotandMezei'stheorem,andyieldsarational \<lexcanbecomputedexactly(afterremovingconictbetweenreadsin).itis thecaseforprogrambst,andtheexactdependenceanalysisresultisshowninfigure4.9.inthegeneralcase,aconservativeleft-synchronousapproximationofmustbe computed,seesection3.7. Analysisalgorithmdoesnotrequireanyapproximation:dependencerelation= Whenisrealizedbyaleft-synchronoustransducer,thelastpartoftheDependenceinstancesofthesameblockI1orJ1.WewillshowinSection5.5thatthisresultcanbe notholdanyrecursivecall i.e.lorr.thatmeansthatalldependencesliebetween ducerisoftheformu=wu0andv=wv0wherew2ff;p;l;r;i1;j1gandu0;v0do Onemayimmediatelynoticethateverypair(u;v)acceptedbythedependencetrans- usedtoruntherstifblock statementi1 inparallelwiththesecond statementj1. Eventually,itappearsthatdependencetransductionisarationalfunction,andthe
147 CHAPTER4.INSTANCEWISEANALYSISFORRECURSIVEPROGRAMS LPjl P I1 I1j" FPj" J1j" RPjr I2pI2pj" J1 I2I2j" I2p->ljlI2p->l J2pJ2pj" J2j" J2 J2p->rjr aj" J2p->r a bpj" bp->ljlcjl dj" bp bp->l c d epj" ep->rjrfjr...figure4.7.rationaltransducerforstoragemappingfofprogrambst... ep ep->r f... LPjLP 1 2 I1jI1 FPjFPJ1jJ1 RPjRP 8 3I2bpjI2p I2pjI2bp 4I2jI2I2cjI2p->l I2p->ljI2c J2pjJ2ep 5 9J2epjJ2pJ2jJ2 J2p->rjJ2f 10 J2fjJ2p->r bpja ajbp 11 6 cjbp->l bp->ljc 7 epjd djep 12 fjep->r ep->rjf...figure4.8.rationaltransducerforconictrelationofprogrambst theonlydependencesonprogrambstareanti-dependences. restrictionoftopairs(u;v)ofareaduandawritevyieldstheemptyrelation!indeed,
148 4.5.THECASEOFARRAYS LPjLP 1 2 I1jI1 FPjFPJ1jJ1 RPjRP 8 3I2pjI2bp 4I2jI2I2p->ljI2c5 9J2pjJ2epJ2jJ2 J2p->rjJ2f ajbp bp->ljc djep ep->rjf...figure4.9.rationaltransducerfordependencerelationofprogrambst arraystructure.practicalcomputationswillbeperformedonprogramqueenspresented Wewillnowprecisethedependenceandreachingdenitionanalysisinthecaseofan TheCaseofArrays in PFPj0 aaj0b A IAAj0 rj0bbj0jj0 QPj0 P0FPj0 bbj1 r J aaj0 A0IAAj0 Jj0 QPj1 J0 s0 sj0 storagemapping.whentheunderlyingdatastructureisanarray,itsabstractionisthe...figure4.10.rationaltransducerforstoragemappingfofprogramqueens... freecommutativemonoidmdata=z.computationoffunctionfforprogramqueens hasalreadybeendoneinsection4.2.5.figure4.10showsarationaltransducerrealizing TherstpartoftheDependence-Analysisalgorithmconsistsincomputingthe
149 148 rationalfunctionf:ctrl!z.itreectsthecombinationofregularexpressions(4.5) CHAPTER4.INSTANCEWISEANALYSISFORRECURSIVEPROGRAMS and(4.6). tion.theresultforprogramqueensisgivenbythetransducerinfigure4.11 withfour initialstates. ComputationofisdonethankstoTheorem3.27,andyieldsaone-countertransduc- theunderlyingrationaltransducerofisrecognizable,henceleft-synchronous(fromtheorem3.12)andcanthusberesynchronizedwiththeconstructiveproofoftheorem3.19 Tocomputeadependencerelation,onerstrestricttopairsofaccesseswithatleast onewrite,thenintersecttheresultwiththelexicographicorder.fromproposition3.13 togetaone-countertransducerwhoseunderlyingrationaltransducerisleft-synchronous. limitedtoconictsoftheform(us;vr),us;vr2lctrl.thelackingthreefourthsofthe andnotusedforreachingdenitionanalysis.theunderlyingrationaltransducerisonly transducerhavenotbeenrepresentedbecausetheyareverysimilarthetherstfourth ResynchronizationofhasbeenappliedtoprogramQueensinFigure4.12:itis pseudo-left-synchronousbecauseresynchronizationhasnotbeenappliedcompletely,see Section3.6andDenition3.28 rationaltransducerisleft-synchronous. canbecomputedexactlyandisrealizedbyaone-countertransducerwhoseunderlying Intersectionwith<lexisdonewithTheorem3.14.Asaresult,thedependencerelation transduceristrimmed).thistime,onlyonethirdoftheactualtransducerisshown:the transitionsjjaandsjqarekeptbuttransitionsajj,ajbandjjbareremoved(andthe synchronoustransducerinfigure4.12.knowingthatb<txtj<txtaands<txtq, ThisisappliedtoprogramQueensinFigure4.13,startingfromthepseudo-left- similartransducers,andarenotusedforreachingdenitionanalysis. transducerrealizingowdependences.antiandoutputdependencesarerealizedbyvery BecauseusisanancestorofuQvr,propertyokaissatised.Dependencetransducerin Queens.Asimpleanalysisoftheinnerloopshowsthatjisalwayslessthank.This provesthatforanyinstancewofr,thereexistsu;v2ctrls.t.w=uqvrandusuqvr. WenowdemonstratetheReaching-Definition-Analysisalgorithmonprogram Figure4.13showsthatallinstancesofsexecutingafterusareoftheformuQv0s,andit Wegettheone-countertransducerinFigure4.14.Noticethatthe?instance(associated holds.applyingrulevpa,wecanremovetransitionjjaawhichdoesnotyieldancestors. afterusmaybeindependencewithw.incombinationwithoka,propertyvpathus alsoshowsthatreadingqincreasesthecounter:theresultisthatnoinstanceexecuting propertyokaensuresthatatleastanancestorofeveryreadinstancedenedavalue. withuninitializedvalues)isnotacceptedasapossiblereachingdenition:thisisbecause is\probably"undecidable.asaresult,weachieved inasemi-automatedway thebest provethatthisresultisexact:auniquereachingdenitioniscomputedforeveryread instance.however,thegeneralproblemofthefunctionalityofanalgebraictransduction Thetransduceris\compressed"inFigure4.15toincreasereadability.Itiseasyto precisionpossible.thispreciseresultwillbeusedinsection5.5toparallelizeprogram 4.6 Queens. nestedlistandarraystructure.practicalcomputationswillbeperformedonprogram Wewillnowprecisethedependenceandreachingdenitionanalysisinthecaseofa TheCaseofCompositeDataStructures Countpresentedin4.3.
150 4.6.THECASEOFCOMPOSITEDATASTRUCTURES FPj";!0 67 BBj" IAAj" QPj" bbj";+1 34 rj" Jj" 5aAj" "jfp; 1 "jaa "jiaa "jj "jqp; 1 8 9"js= "jbb "jiaa "jqp FPj";!0 "jbb; "jr;=0 "jj "jaa"jfpaaj" IAAj" Jj" QPj"; sj" 19FPj";!0 BBj"; IAAj" 22rj" Jj" 23aAj" QPj" 29FPj";!0 bbj";+1 aaj" 30IAAj" Jj" QPj";+1 "jfp "jbb 26 25"jIAA "jqp "jbb; 1 27 "jr "jj28"jaa 34 "jfp; sj" "jaa"jiaa 35"jJ "jqp; "js...figure4.11.one-countertransducerforconictrelationofprogramqueens... storagemapping.whentheunderlyingdatastructureisbuiltofnestedtreesandarrays, itsabstractionisafreepartiallycommutativemonoidmdata.computationoffunctionf forprogramcounthasalreadybeendoneinsection TherstpartoftheDependence-Analysisalgorithmconsistsincomputingthe
151 CHAPTER4.INSTANCEWISEANALYSISFORRECURSIVEPROGRAMS "jbb; 1 24 "jbb "jaa "jr "jiaa13 "jjjjaa 5 JjBB "jbb; 1 aajj "jqp IAAj" 68 QPj";+1 "j" "jbb"jaa aajbb "jr "jiaa10 12"jQP "jj14 Jj" 9sj";=0 aajaa 21 "j" IAAj" aaj" 22 JjJQPjQP;+1 IAAjIAA 17QPj";+1sj";= FPjFP!0 aaj" Jj" 18 "jr;=026 "jiaa sjqp "jbb 25 "jqp "jbb; 1 "jaa "jj Figure4.12.Pseudo-left-synchronoustransducerfortherestrictionoftoWR tion.onprogramcount,therearenowriteaccessestotheinodestructure.now,we couldbeinterestedinananalysisofconict-missesforcacheoptimization[td95].the resultf 1fforprogramCountisthusinteresting,anditistheidentityrelation!This ComputationofisdonethankstoTheorem3.28,andyieldsaone-countertransduc- left-synchronousone.eventually,thereaching-definition-analysisalgorithmhas apply:itisnecessaryingeneraltoapproximatetheunderlyingrationaltransducerbya provesthatthesamememorylocationisneveraccessedtwiceduringprogramexecution. notechnicalissuesspecictonestedtreesandarrays. Now,whencomputingadependencerelationingeneral,Proposition3.13doesnot 4.7 Beforeevaluatingouranalysisforrecursiveprograms,wesummaryitsprogrammodel restrictions.firstofall,somerestrictionsarerequiredtosimplifyalgorithmsandshould ComparisonwithOtherAnalyses fordetails: beconsideredharmlessthankstopreviouscodetransformations seesections2.2and4.2 nofunctionpointers(i.e.higher-ordercontrolstructures)andnogotosareallowed;
152 4.7.COMPARISONWITHOTHERANALYSES "jbb; 1 24 "jbb "jaa "jr "jiaa13 "jjjjaa "jq5 IAAj" 68 QPj";+1 "j" 7 Jj" 9sj";=0aAjaA 12 aaj" 13 JjJQPjQP;+1IAAjIAA "jr;= FPjFP 17 "jiaa sjqp "jbb 16 "jqp "jj 18!0 Figure4.13.One-countertransducerfortherestrictionofdependencerelationtoow "jbb; 1 "jaa... dependences expressionsinright-handsidemayholdconditionalsbutnofunctioncallsandno aloopvariableisinitializedattheloopentryandusedonlyinsidethisloop; everydatastructuresubjecttodependenceorreachingdenitionanalysismustbe loops; programtransformations,butshouldberemovedinfurtherversionsoftheanalysis,thanks Now,somerestrictionsontheprogrammodelcannotbeenavoidedwithpreliminary declaredglobal; toappropriateapproximationtechniques(inductionvariablesaredenedinsection4.2): inductionvariablesmustfollowverystrongrulesregardinginitializationandupdate; onlyscalars,arrays,treesandnestedtreesandarraysareallowedasdatastructures; everyarraysubscriptmustbeananefunctionofintegerinductionvariablesand everytreeaccessmustdereferenceapointerinductionvariableoraconstant. symbolicconstants;
153 CHAPTER4.INSTANCEWISEANALYSISFORRECURSIVEPROGRAMS aajaa 3 4 JjJQPjQP;+1IAAjIAA 12 "jr;=06 5 FPjFP 8 "jiaa sjqp "jbb 7 "jqp "jj 9!0 Figure4.14.One-countertransducerforreachingdenitionrelationofprogramQueens "jbb; 1 "jaa... 1 JQPIAAjJQPIAA;+1!0 FPIAAjFPIAAaAjaA 2 JsjJQPIAA "jjqpiaa 3 "jbb; 1 "jaa "jbb 4 "jr;=0 5...Figure4.15.Simpliedone-countertransducerfor... structures,andweexpectnogeneralwaytoavoidit: randominsertionsanddeletionsintreesareforbidden(allowedonlyattrees'leaves). Eventually,onerestrictionisverydeeplyrootedinthemonoidabstractionfortree theexistinginstancewiseanalysesforloopnests. staticanalyses someofwhichalsohandleourfullprogrammodel andwiththoseof Staticdependenceandreachingdenitionanalysesgenerallycomputethesamekindof Wearenowabletocomparetheresultsofouranalysistechniquewiththoseofclassical in[rr99].comparisonoftheresultsisrathereasy:noneofthesestaticanalysesisinstancewise.4noneofthesestaticanalysesisabletotellwhichinstanceofwhichstatemensivestudyofstaticanalysisusefultoparallelizationofrecursiveprogramscanbefound orotherdata-owanalysistechniques[lrz93,be95,hhn94,ksv96].acomprehen- results,whethertheyarebasedonabstractinterpretation[cou81,jm82,har89,deu94] interpretationframeworkisindeedpossible,butveryfewworkshavebeenmadeinthisdirection,see 4Wethinkthatbuildinganinstancewiseanalysisofpracticalinterestinthedata-oworabstract
154 4.7.COMPARISONWITHOTHERANALYSES isinconict,independence,orapossiblereachingdenition.however,theseanalyses 153 areveryusefultoremoveafewrestrictionsinourprogrammodel,andtheyalsocompute propertiesusefultoinstancewisereachingdenitionanalysis.rememberthatourown instancewisereachingdenitionanalysistechniquemakesaheavyuseofsocalled\external"analyses,whichpreciselyareclassicalstaticanalyses.ashortcomparisonbetween beproposedinsection5.5,alongwithsomepracticalexamples. parallelizationfromtheresultsofouranalysisandparallelizationfromstaticanalyseswill methodtogetafairevaluationconsistsinrunningbothanalysesontheircommonprogram niquewasclearlyintendedtoextendsuchanalysestorecursiveprograms.asimple modelsubset.thegeneralresultisnotsurprising:today'smostpowerfulreachingdenitionanalysesforloopnestssuchasfuzzyarraydataowanalysis(fada)[bcf97,bar98] Comparisonwithinstancewiseanalysesforloopnestsismoretopical,sinceourtech- ouranalysisforrecursiveprograms.therearemanyreasonsforthat: andconstraint-basedarraydependenceanalysis[wp95,won95]arefarmoreprecisethan wedonotuseconditionalsandloopboundstoestablishourresults,orwhenitis multi-dimensionalarraysareroughlyapproximatedbyone-dimensionalones; thecase,itisthrough\external"staticanalyses; somecriticalalgebraicoperationssuchasintersectionandcomplementationarenot rationalandalgebraictransducershavealimitedexpressivepowerwhendealing withintegerparameters(onlyonecountercanbedescribed); rootedthephilosophyofeachtechnique. AmajordierencebetweenFADAandouranalysisforrecursiveprogramisdeeply decidableandthusrequirefurtherapproximations. FADAisafullyexactprocesswithsymboliccomputationsand\dummy"parameters computationprocess(seesection2.4.3). attheend;thisensuresthatnopreciousdata-owinformationislostduringthe associatedwithunpredictableconstraints,andonlyoneapproximationisperformed Ourtechniqueisnotasclever,sincemanyapproximationstagescanbeinvolved. Itismoresimilartoiterativemethodsinthatsense,andhenceitisfarfrombeing Butthecomparisonalsorevealsverypositiveaspects,intermsofalltheinformation haveenoughexpressivepowertoavoidit. optimal:someapproximationsaremadeevenifthemathematicalabstractioncould availableintheresultofouranalysis: exactnessoftheresultisequivalenttodecidingthefunctionalityofatransduction, ones,anddecidabilityofthenitenessofasetofreachingdenitionscanhelpin andisthuspolynomialforrationaltransductions;butitisunknownforalgebraic emptinessofasetofreachingdenitionsisdecidable,whichallowsautomaticdetectionofreadaccessestouninitializedvariables; somecases; [DGS93,Tzo97,CK98].
155 154inthecaseofrationaltransductions,dependencetestingcanbeextendedtorational CHAPTER4.INSTANCEWISEANALYSISFORRECURSIVEPROGRAMS inthecaseofalgebraictransductions,dependencetestingisequivalenttotheintersectionofanalgebraiclanguageandarationalone,becauseofnivat'stheorem3.21 foralgebraictransductionsandevey'stheorem3.24;thisisstillveryusefulforpar- languagesareclosedunderintersection;thisisveryusefulforparallelization; languagesofcontrolwords,becauseofnivat'stheorem3.6andthefactthatrational analysisandloopnestanalysestoparallelization. WerefertoSection5.5foradditionalcomparisonsbetweentheapplicabilityofour allelization. Wepresentedanapplicationofformallanguagetheorytotheautomaticdiscoveryof 4.8 somesemanticpropertiesofprograms:instancewisedependencesandreachingdenitions.whenprogramsarerecursiveandnothingisknownaboutrecursionguards,only Conclusion rays)transduction.theresultofthereachingdenitionanalysisisatransducermapping betweenreadsandtheirreachingdenitionsbyarational(fortrees)oralgebraic(forar- controlwordsofreadinstancestocontrolwordsofwriteinstances.twoalgorithmsfor conservativeapproximationscanbehopedfor.inourcase,weapproximatetherelation dependenceandreachingdenitionanalysisofrecursiveprogramsweredesigned.incidentally,theseresultsshowedtheuseofthenewclassofleft-synchronoustransductions overfreemonoids. Weshouldthusworkonagracefuldegradationofouranalysestoencompassalargerset somestrongrestrictionsontheprogrammodellimitthepracticaluseofourtechnique. proximationsandsometimesevenexactresults.someproblemsobviouslyremain.first, Wehaveappliedourtechniquesonseveralpracticalexamples,showingexcellentap- reachingdenitionanalysisisnotquitematurenow,sinceitreliesonratherad-hoctechniqueswhosegeneralapplicabilityisunknown.moretheoreticalstudiesareneededto perhapsberemovedbyallowingcomputationofapproximatestoragemappings.second, ofrecursiveprograms:forexample,restrictionsoninductionvariablesoperationscould rationalandalgebraictransducers. decidewhetherpreciseinstancewisereachingdenitioninformationcanbecapturedby parallelizationofrecursiveprograms.theseapplicationsincludearrayexpansionand braictransductionsallowseveralapplicationsofourframework,especiallyinautomatic parallelismextraction. Wewillshowinthenextchaptersthatdecidabilitypropertiesonrationalandalge-
156 155 Chapter5 ParallelizationviaMemory Expansion Thedesignofprogramtransformationsdedicatedtodependenceremovalisawellstudied topic,asfarasnestedloopsareconcerned.techniquessuchasconversiontosingleassignmentform[fea91,gc95,col98],privatization[mal93,tp93,cre96,li92],and manyoptimizationsforecientmemorymanagement[lf98,cfh95,cdrv97,qr99] However,theseworkshavemostlytargetedaneloopnestsandfewtechniqueshave havebeenprovenusefulforpracticalparallelizationofprograms(automaticallyornot). single-assignment)[cfr+91]frameworkbecomeobvious. issuesarisewhentryingtoexpanddatastructuresinunrestrictednestsofloops,and becauseofthenecessarydata-owrestoration,conuentinterestswiththessa(static beenextendedtodynamiccontrolowandgeneralarraysubscripts.veryinteresting transformation.novelexpansiontechniquespresentedinsections5.2,5.3and5.4are nestsofloopsandwedesignpracticalsolutionsforageneralsingle-assignmentform therstgoalofsection5.1;then,westudyspecicproblemsrelatedwithnon-ane Motivationformemoryexpansionandintroductionofthefundamentalconceptsis techniquesforaneloopnestsandthefewresultswithirregularcodes. contributionstobridgingthegapbetweentherichapplicationsofmemoryexpansion dataparallelmodelfornestedloops.applicablealgorithmshavebeenmostlydesigned nature:principlesofparallelprocessingarethenverydierentfromthewellmastered forstatementwisedependencetests,whenouranalysiscomputesanextensiveinstancewisedescriptionofthedependencerelation!thereisofcoursealargegapbetweenthe twoapproachesandweshouldnowdemonstratethatusingsuchapreciseinformation forrecursiveprograms.becausethislastsectionaddressesanewtopic,severalnegative Whenextendingtheprogrammodeltorecursiveprocedures,theproblemisofanother addressedbysection5.5,startingwithaninvestigationofmemoryexpansiontechniques bringspracticalimprovementsoverexistingparallelizationtechniques.theseissuesare 5.1 ordisappointinganswersaremixedwithsuccessfulresults. Topointoutthemostimportantissuesrelatedwithmemoryexpansion,andtomotivate thefollowingsectionsofthischapter,westartwithastudyofthewell-knownexpansion MotivationsandTradeos ofviewsarediscussed.severalresultspresentedherehavebeenalreadypresentedby techniquecalledconversiontosingle-assignmentform.bothabstractandpracticalpoint
157 manyauthors,withtheirformalismandtheirprogrammodel,butwepreferedtorewrite 156 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION mostofthisworkinoursyntaxtoxthenotationsandtoshowhowmemoryexpansion alsomakessenseoutoftheloopnestprogrammingmodel. Oneofthemostusualandsimplestexpansionschemesisconversiontosingle-assignment (SA)form.Itistheextremecasewhereeachmemorylocationiswrittenatmostonce ConversiontoSingle-AssignmentForm duringexecution.thisisslightlydierentfromstaticsingle-assignmentform(ssa) [CFR+91,KS98],whereeachvariableiswrittenatmostinonestatementintheprogram, andexpansionislimitedtovariablerenaming. DbyanassignmenttoanewdatastructureDexpwhoseelementshavethesametypeas duringanyprogramexecution.eachelementofdexpisassociatedtoasinglewriteaccess. elementsofd,andareinone-to-onemappingwiththesetwofallpossiblewriteaccesses TheideaofconversiontoSA-formistoreplaceeveryassignmenttoadatastructure Thisaggressivetransformationensuresthatthesamememorylocationisneverwritten accordingly,andiscalledrestorationoftheowofdata.instancewisereachingdenition informationisofgreathelptoachievethis:foragivenprogramexecutione2e,thevalue twiceintheexpandedprogram.thesecondstepistotransformthereadreferences Ingeneral,anexactknowledgeofeforeachexecutioneisnotavailableatcompiletime: elementofdexpassociatedwithe(h{;refi)(seesection2.4fornotationsanddenitions). theresultofinstancewisereachingdenitionanalysisisanapproximaterelation.the readbysomeaccessh{;refitodinright-handsideofastatementispreciselystoredinthe anon-singletonset:theideaisthentogeneratearun-timedata-owrestorationcode, compile-timedata-owrestorationschemeaboveisthusunapplicablewhen(h{;refi)is whoseargumentistheset(h{;refi)ofpossiblereachingdenitions. whichtrackswhatisthelastinstanceexecutedin(h{;refi).aswehaveseenforgeneral expansionschemesinsection1.2,thisrun-timerestorationcodeishiddeninafunction CurInsholdstherun-timeinstancevalue,encodedasacontrolwordoriteration Afewnotationsarerequiredtosimplifythesyntaxofexpandedprograms. vector,foranystatementintheprogram.itissupposedtobeupdatedon-linein hasthesyntaxofafunctionfromsetsofrun-timeinstancestountypedvalues, functioncalls,loopiterationsandeveryblockentry.moreprecisionsaboutthis butitssemanticsistosummarizeapieceofdata-owrestorationcode.itisvery topicinsection5.1.3andsection DexpistheexpandeddatastructureassociatedwithsomeoriginaldatastructureD. functionsisthepurposeofsection similartofunctionsinthessaframework[cfr+91,ks98].codegenerationfor Its\abstract"syntaxisinheritedfromarrays:Dexp[setofelementnames]for elementnamesareeitherintegervectorsorwords,anddexpisanarray,atree,ora thedeclarationanddexp[elementname]forthereadorwriteaccess.inpractice, WenowpresentAbstract-SA:averygeneralalgorithmtocomputethesingleassignmentform.Thisalgorithmisneitherreallynewnorreallypractical,butitdenes asapointertoatreestructure.seesections5.1.3and5.5.1fordetails. nestoftreesandarrays.its\concrete"syntaxisthenimplementedasanarrayor ageneraltransformationschemeforsaprograms,independentlyofthecontrolanddata
158 5.1.MOTIVATIONSANDTRADEOFFS structures.ittakesasinputthesequentialprogramandtheresultofaninstancewise 157. partsofthegeneratedcodehavebeenencapsulatedinhigh-levelnotations:curinsand Thisalgorithmisvery\abstract"sincedatastructuresarenotdenedpreciselyandsome reachingdenitionanalysis seenasafunction.controlstructuresareleftunchanged. Abstract-SA(program;W;) returnsanintermediaterepresentationoftheexpandedprogram :areachingdenitionrelation,seenasafunction W:aconservativeapproximationofthesetofwriteaccesses program:anintermediaterepresentationoftheprogram 31foreachdatastructureDinprogram 2dodeclareadatastructureDexp[W] 654 doref foreachreferencereftodinprogram doleft-handsideofs foreachstatementsassigningdinprogram if((curins;ref)==f?g)ref elseif((curins;ref)=f{g)dexp[{] Dexp[CurIns] 7returnprogram Wewillshowinthefollowingthatseveral\abstract"partsofthealgorithmcanbe else((curins;ref)) implementedwhendealingwith\concrete"datastructures.generatingcodeforthe functionisthepurposeofthenextsection. Whengeneratingcodeforfunctions,thecommonideaistocomputeatrun-timethe lastinstancethatmaypossiblybeareachingdenitionofsomeuse.ingeneral,foreach Run-TimeOverhead expandeddatastructuredexponeneedsanadditionalstructureinone-to-onemapping withdexp.inthestaticsingle-assignmentframeworkforarrays[ks98],theseadditional denotedbydexp. anothernotation:thedatastructureinone-to-onemappingwithdexpisa-structures eralsingle-assignmentform,weproposeanothersemanticsforadditionalstructures,hence andtheidentityofthelastinstancewhichassignedthismemorylocation.becausewe aredealingwithsingle-assignmentprograms,theidentityofthelastinstanceisalready shouldstoretwoinformations:thememorylocationassignedintheoriginalprogram Toensurethatrun-timerestorationoftheowofdataispossible,elementsofDexp thusstorememorylocations. capturedbytheelementitself(i.e.thesubsrciptofdexp).1elementsofdexpshould DexpisinitializedtoNULLbeforetheexpandedprogram; typeand/orsemanticsof-structures. 1Thisrun-timerestorationtechniqueisthusspecictoSA-form.Otherexpansionsrequiredierent EverytimeDexpismodied,theassociatedelementofDexpissettothevalueof thememorylocationthatwouldhavebeenwrittenintheoriginalprogram.
159 158WhenareadaccesstoDintheoriginalprogramisexpandedintoacalloftheform CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION readintheoriginalprogram. executionorder ofall{2setsuchthatdexp[{]isequaltothememorylocation (set),thefunctionisimplementedasthemaximum accordingtothesequential Abstract-Implement-Phi(expanded) 2doiftherearefunctionsaccessingDexp 1foreachdatastructureDexpinexpanded expanded:anintermediaterepresentationoftheexpandedprogram 3returnsanintermediaterepresentationwithrun-timerestorationcode 465 thendeclareastructuredexpwiththesameshapeasdexpinitializedtonull foreachreadreferencereftodexpwhoseexpandedformis(set) doforeachstatementsinvolvedinset 987 (set) dorefs ifnotalreadydonefors thenfollowingsinsertdexp[curins]=fe(curins;refs) Dexp[max<seqf{2set:Dexp[{]=fe(CurIns;ref)g] writereferenceins functions.inthisalgorithm,thesyntaxfe(curins;ref)meansthatweareinterestedin 10returnexpanded thememorylocationaccessedbyreferenceref,andnotthatsomecompile-timeknowledge offeisrequired.ofcourse,practicaldetailsandoptimizationsdependonthecontrol Abstract-Implement-Phiistheabstractalgorithmtogeneratethecodefor thekeytorun-timeoverheadoptimization.indeed,asshownbyourcodegenerationalgorithm,sa-transformedprogramsaremoreecientwhenfunctionsaresparse.thus, elementofanew-structureiswrittenatmostonce. structures,seesection5.1.4.noticethatthegeneratedcodeisstillinsaform:each aparallelizingcompilerhasmanyreasonstoperformapreciseinstancewisereachingdefinitionanalysis:itimprovesparallelismdetection,allowtochoosebetweenalargerscope ofparallelexecutionorders(dependingonthe\grainsize"andarchitecture),andre- right-handsideofris ducesrun-timeoverhead.anexampleborrowedfromprogramsjsin[col98]ispresented infigure5.1.themostprecisereachingdenitionrelationforreferencea[i+j-1]in Animportantremarkatthispointisthatinstancewisereachingdenitionanalysisis (hr;i;j;a[i+j-1]i)= thenhs;i;j 1i ifj1 elseelsehti ifi1 thenhs;i 1;ji: relationinducesaspuriousfunction,asinfigure5.1.b.onemaynoticethatthequast implementationinfigure5.1.cisnotreallyecientandmayberathercostly;butusing Rneverreachanyuse.Expandingtheprogramwithalessprecisereachingdenition Thisexactresultshowsthatdenitionsassociatedwiththereferenceinleft-handsideof oncemoreforfurtherstudiesaboutintegratingoptimizationtechniques. [AI91] cansignicantlyreducethisoverhead,seefigure5.1.d.thisremarkadvocates classicaloptimizationssuchaslooppeeling orgeneralpolyhedronscanningtechniques
160 5.1.MOTIVATIONSANDTRADEOFFS TA[0]=0; doublea[n]; Sfor(i=0;i<N;i++) for(j=0;j<n;j++){ TAT=0; R } A[i+j]=; for(i=0;i<n;i++) doublea[n],at,as[n,n],ar[n,n]; Figure5.1.a.Originalprogram A[i]=A[i+j-1]; RS for(j=0;j<n;j++){ AR[i,j]=(fhTig[fhS;i0;j0i: AS[i,j]=; } (i0;j0)<lex(i;j)g) TAT=0; doublea[n],at,as[n,n],ar[n,n]; Figure5.1.b.SAwithoutreachingdenitionanalysis RSfor(i=0;i<N;i++) for(j=0;j<n;j++){ AR[i,j]=if(j==0)if(i==0)ATelseAS[i-1,j] AS[i,j]= } ; elseas[i,j-1] Figure5.1.c.SAwithprecisereachingdenitionanalysis AT=0; AR[1,1]=AT; AS[1,1]=; doublea[n],at,as[n,n],ar[n,n]; for(i=0;i<n;i++){ AS[i,1]=; AR[i,1]=AS[i-1,1]; for(j=0;j<n;j++){ }} AR[i,j]=AS[i,j-1]; AS[i,j]=; Figure5.1.d.Precisereachingdenitionanalysispluslooppeeling...Figure5.1.Interactionofreachingdenitionanalysisandrun-timeoverhead... whenitisafunction(i.e.itisexact).butthereisabigdierencebetweenthetwo overhead:computingreachingdenitionsusingatrun-timemayalsobecostly,even sourcesofoverhead:run-timecomputationofcanbecostlybecauseofthelackof Eventually,oneshouldnoticethatfunctionsarenottheonlysourceofrun-time expressivenessofcontrolstructuresandalgebraicoperationsinthelanguageorbecauseof thanquasts.ontheopposite,theoverheadoffunctionsisduetotheapproximative themathematicalabstraction.forexample,transductionsgenerallyinducemoreoverhead
161 knowledgeoftheowofdataanditsnon-deterministicimpactonthegeneratedcode;it 160 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION thecaseforfunctions. isthusintrinsictotheexpandedprogram,nomatterhowitisimplemented.inmany optimizationtechniques anexamplewillbepresentedlateronfigure5.1 butitisnot cases,indeed,therun-timeoverheadtocomputecanbesignicantlyreducedbyclassical scalarsandarrays.anextensiontofunctioncalls,recursiveprogramsandrecursivedata Inthissection,weonlyconsiderintra-proceduralexpansionofprogramsoperatingon Single-AssignmentforLoopNests structuresisstudiedattheendofthischapter,insection5.5.theserestrictionssimplify anerelation(see[bcf97,bar98]andsection2.4.3).wepointedinsection3.1.1that theexpositionofa\concrete"saalgorithmintheclassicalloopnestframework. aquast[fea91].thisrepresentationofrelationisespeciallyinterestingforexpansion seeingananerelationasafunction,itcanbewrittenasanestedconditionalcalled Whendealingwithnestofloops,instancewisereachingdenitionsaredescribedbyan anyanerelation. purposessinceitcanbeeasilyandecientlyimplementedinaprogramminglanguage. AlgorithmMake-QuastintroducedinSection3.1.1buildsaquastrepresentationfor Stmt(hS;xi)=S(thestatement), Weusethefollowingnotations: andarray(s)isthenameoftheoriginaldatastructureassignedbystatements. Iter(hS;xi)=x(theiterationvector), Givenaquastrepresentationofreachingdenitions,Convert-Quastgeneratesanef- cientcodetoretrievethevaluereadbysomereference.thiscodeismoreorlessa compile-timeimplementationoftheconditionalgeneratedattheendofabstract-sa. Afunctionisgeneratedwhenanon-singletonsetisencountered.Eventually,because useanarrayas[x]insteadoftheproposedaexphs;xiintheabstractsaalgorithm. statementspartitionthesetofmemorylocationsinthesingle-assignmentprogram,we scalarsareseenareone-dimensionalarraysofasingleelement.allmemoryaccessesare iterationvector(builtfromthesurroundingloopvariables).tosimplifytheexposition, ThenewalgorithmisLoop-Nests-SA.CurrentinstanceCurInsisimplementedbyits ThankstoConvert-Quast,wearereadytospecializeAbstract-SAforloopnests. thusperformedthrougharraysubscripts. stancesarestoredinadistinctstructureforeachstatement:weuseas[x]insteadof Aexp[hS;xi].ThenewalgorithmisLoop-Nests-Implement-Phi.Ecientcomputationofthelexicographicmaximumcanbedonethankstoparallelreductiontechniques [RF94]. dealingwithloopnestsandarraysonly.forthesamereasonasbefore,run-timein- Theabstractcodegenerationalgorithmforfunctionscanalsobeprecisedwhen someexpandedarraysasdynamicarrayswhosesizeisupdatedatrun-time.another loopboundsarenoteasilypredictableatcompile-time.onemaythushavetoconsider regardingarraydeclarationistogetacompile-timeevaluationofitssize.inmanycases, Onepartofthecodeisstillunimplemented:thearraydeclaration.Themainproblem technique suchastheonepresentedinsection5.3 tosingle-assignmentform,andto solutionproposedbycollard[col94b,col95b]istopreferastoragemappingoptimization
162 5.1.MOTIVATIONSANDTRADEOFFS Convert-Quast(quast;ref) quast:thequastrepresentationofthereachingdenitionfunction 161 ref:theoriginalreference,usedwhen?isencountered 31switch 2returnstheimplementationofquastasavalueretrievalcodeforreferenceref casequast=f?g: 654 casequast=f{g: A returnref 78 S x returnas[x] Iter({) Stmt({) Array({) casequast=f{1;{2;:::g: casequast=ifpredicatethenquast1elsequast2: return(f{1;{2;:::g) Loop-Nests-SA(program;) returnifpredicateconvert-quast(quast1;ref) program:anintermediaterepresentationoftheprogram elseconvert-quast(quast2;ref) 2doforeachstatementSassigningAinprogram 1foreacharrayAinprogram returnsanintermediaterepresentationoftheexpandedprogram :areachingdenitionrelation,seenasafunction 43 dodeclareanarrayas 765 do=ref foreachreadreferencereftoainprogram quast left-handsideofsisreplacedbyas[iter(curins)] \(Iref) 10returnprogram 89 map ref map(curins) Convert-Quast(quast;ref) Make-Quast(=ref) code.twoexamplesofcodegenerationforfunctionsareproposedinthenextsection. impairsparallelization.suchstructuresareveryusualinhigh-levellanguages,butmay resultinpoorperformancewhenthecompilerisunabletoremovetherun-timeverication foldtheunboundedarrayintoaboundedonewhentheassociatedmemoryreusedoesnot functions;andthiscostiscriticalfornon-scalardatastructuresdistributedacross Mostoftherun-timeoverheadcomesfromdynamicrestorationofthedataow,using OptimizationoftheRun-TimeOverhead processors.thetechniquepresentedinsection5.2(maximalstaticexpansion)eradicates suchrun-timecomputations,tothecostofsomelossinparallelismextraction.indeed, ofthissection. functionsmaysometimesbeanecessaryconditionforparallelization.thisjustiesthe designofoptimizationtechniquesforfunctioncomputation,whichisthesecondpurpose Therstmethodgroupsseveralbasicoptimizationsforloopnests,thesecondoneisbased Wenowpresentthreeoptimizationstothecode-generationalgorithminSection5.1.2.
163 Loop-Nests-Implement-Phi(expanded) 162expanded:anintermediaterepresentationoftheexpandedprogram CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION 1foreacharrayASinexpanded 2dodA 3returnsanintermediaterepresentationwithrun-timerestorationcode 4 refs dimensionofarrayas 65 iftherearefunctionsaccessingas thendeclareanarrayofda-dimensionalvectorsas writereferenceins 7 initializeastonull foreachreadaccesstoasoftheform(set)inexpanded doifnotalreadydonefors theninsert 13doforeachreadaccess(set)associatedwithAinexpanded 12foreachoriginalarrayAinexpanded 11 immediatelyafters AS[Iter(CurIns)]=fe(CurIns;refS) 14 do(set) parallelfor(eachsinstmt(set)) 15returnexpandedAStmt(instance)[Iter(instance)] instance=max<seqfhs;vector[s]i:s2stmt(set)g vector[s]=max<lexfx:hs;xi2set^as[x]=fe(curins;ref)g onanewinstancewiseanalysis,andthelastoneavoidredundantcomputationsduring thepropagationof\live"denitions.thesecondandthirdmethodsapplytoloopnests FirstMethod:BasicOptimizationsforLoopNests andrecursiveprogramsaswell. (seeloop-nests-implement-phi).becauseofthehierarchicalstructureofloopnests, Whendealingwithnestsofloops,the-structuresare-arraysindexedbyiterationvectors allowstheremovaloftheassociateddimensionsin-arraysandtoreducethecomplexity motiontechniquesforinvariantassignments.anexampleof-arraysimplicationand oflexicographicmaximumcomputations.anotherconsequenceistheapplicabilityofup- accessesinaset(u)areverylikelytoshareafewiterationvectorcomponents.this ofiterationvectors,andwherethemaximumofanemptysetisthevector( 1;:::; 1). up-motionisdescribedinfigure5.2,wherefunctionmaxcomputesthemaximumofaset tosingle-assignmentformoftenrequiresa-functionbutthelastdeningwritecanbe Whenaloopassignsthesamememorylocationanunboundednumberoftimes,conversion terminationconditioniscomplex:non-anebounds,breakstatementsorexceptions. Anotherinterestingoptimizationisonlyapplicabletowhileloopsandforloopswhose loopblock. theloopcounter.2anexampleisdescribedinfigure5.3. computedwithoutusing-arrays:itsiterationvectorisassociatedwiththelastvalueof 2Thesemanticsoftheresultingcodeiscorrect,butratherdirty:aloopvariableisusedoutsideofthe
164 5.1.MOTIVATIONSANDTRADEOFFS... doublex; 163 for(i=1;i<=n;i++){ S for(j=1;j<=n;j++) if() for(k=1;k<=n;k++) for(i=1;i<=n;i++){ doublex,xs[n+1,n+1,n+1]; R} =x; x=; S for(j=1;j<=n;j++) if() for(k=1;k<=n;k++) xs[i,j,k]=; Figure5.2.a.Originalprogram Figure5.2.b.SAprogram R} =(fhs;i;j0;ni:1j0ng[f?g); for(i=1;i<=n;i++){ doublex,xs[n+1,n+1,n+1],xs[n+1,n+1,n+1]={null}; S for(j=1;j<=n;j++) if() for(k=1;k<=n;k++){ R ={ } xs[i,j,k]=; maxs=maxf(i;j0;k0):1j0n^k0=n^xs[i;j0;k0]=&xg; xs[i,j,k]=&x; }} if(maxs!=( 1; 1; 1))xS[maxS]elsex; Figure5.2.c.Standardimplementation for(i=1;i<=n;i++){ doublex,xs[n+1,n+1,n+1],xs[n+1]={null}; S for(j=1;j<=n;j++){ if(){ for(k=1;k<=n;k++){ R ={ } xs[j]=&x; xs[i,j,k]=; }} maxs=maxfj0:1j0n^xs[j0]=&xg; if(maxs!= 1)xS[maxS]elsex; Figure5.2.d.Optimizedimplementation SecondMethod:ImprovingtheSingle-AssignmentFormAlgorithm...Figure5.2.Basicoptimizationsofthegeneratedcodeforfunctions... denitions.whenthereadstatementistoocomplextobeanalyzedatcompile-time, Insomecases,functionscanbecomputedwithout-arraystostorepossiblereaching
165 164 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION... doublex; while() S x=; R=x; Figure5.3.a.Originalprogram doublex,xs[]; w=1; while(){ S xs[w]=; w++; } R=(fhS;wi:1wg[f?g); Figure5.3.b.SAprogram doublex,xs[],xs[]={null}; w=1; while(){ S xs[w]=; xs[w]=&x; w++; } R={ maxs=maxfw:xs[w]=&xg; if(maxs!= 1)xS[maxS]elsex; } Figure5.3.c.Standardimplementation doublex,xs[]; w=1; while(){ S xs[w]=; w++; } R=if(w>1)xS[w-1]elsex; Figure5.3.d.Optimizedimplementation...Figure5.3.Repeatedassignmentstothesamememorylocation... thesetofpossiblereachingdenitionscanbeverylarge.however,ifwecouldcompute theverymemorylocationaccessedbythereadstatement,thesetofpossiblereaching denitionswouldbemuchsmaller sometimesreducedtoasingleton.thisshowsthe needforanadditionalinstancewiseinformation,calledreachingdenitionofamemory location:theexactfunctionwhichdependsonanexecutione2eoftheprogramis denotedbyml eanditsconservativeapproximationbyml.herearetheformaldenitions: 8e2E;8u2Re;c2fe(We):ml e(u;c)=max <seqv2we:v<sequ^fe(v)=c ; 8e2E;8u2Re;c2fe(We):v=ml e(u;c)=)v2ml(u;c): Computingrelationmlisnotreallydierentfromreachingdenitionanalysis.To computethemlforareferencerinright-handsideofastatement,risreplacedbya readaccesstoanewsymbolicmemorylocationc,thenclassicalinstancewisereaching denitionanalysisisperformed.theresultisareachingdenitionrelationparameterized byc.seeingcasanargument,ityieldstheexpectedapproximaterelationml.insome rarecases,thiscomputationschemeyieldsunnecessarycomplexresults:3thegeneral solutionisthentointersecttheresultwith. AlgorithmAbstract-ML-SAisanimprovedsingle-assignmentformconversionalgorithmbasedonreachingdenitionsofmemorylocations.Itisbasedontheexact 3ConsideranarrayA,anassignmenttoA[foo]andareadreferencetoA[foo],wherefooissome complexsubscript.aprecisereachingdenitionanalysiswouldcomputeanexactresultbecausethe subscriptisthesameinthetwostatements.however,thereachingdenitionofagivenmemorylocation isnotknownprecisely,becausefoointheassignmentstatementisnotknownatcompiletime.
166 5.1.MOTIVATIONSANDTRADEOFFS run-timecomputationofthesymbolicmemorylocationwithstoragemappingfe.this 165 referencecode possiblycomplex tobesubstitutedtothesymbolicmemorylocationc. Inbothcases,thevalueoffeshouldnotbeinterpreted,itmustbeusedastheoriginal bythecurrentinstanceandthesymbolicmemorylocation,seeloop-nests-ml-sa. algorithmcanalsobeenspecializedforloopnestsandarrays,usingquastsparameterized AnexampleisdescribedinFigure Sfor(i=1;i<=N;i++) doublea[n+1]; Figure5.4.a.Originalprogram for(j=1;j<=n;j++) A[j]=A[j]+A[foo]; SdoubleA[N+1],AS[N+1,N+1]; for(i=1;i<=n;i++) for(j=1;j<=n;j++) AS[j]=if(i>1)AS[i-1,j]elseA[j] Figure5.4.b.SAprogram +if(i>1 j>1)(f?g[fhs;i0;j0i:1i0;j0n^(i0;j0)<lex(i;j)g) elsea[foo]; SdoubleA[N+1],AS[N+1,N+1]; for(i=1;i<=n;i++) for(j=1;j<=n;j++) AS[j]=if(i>1)AS[i-1,j]elseA[j] Figure5.4.c.SAprogramwithreachingdenitionsofmemorylocations +if(foo<j)as[i,foo] elseif(i>1)as[i-1,foo]elsea[foo]; ThirdMethod:CheatingwithSingle-Assignment...Figure5.4.ImprovingtheSAalgorithm... Ageneralproblemwithimplementationsoffunctionsbasedon-structuresisthelarge redundancyoflexicographicmaximumcomputations.indeed,eachtimeafunction isencountered,themaximumofthefullsetofpossiblereachingdenitionsmustbe recomputethemaximumofthesameset.thesetechniquesarewellsuitedtothevariable computed.inthestaticsingle-assignmentframework(ssa)[cfr+91,ks98],alarge renaminginvolvedinssa,butareunabletosupportthedatastructurereconstruction partoftheworkisdevotedtooptimizedplacementoffunctions,inordertonever performedbysaalgorithms.nevertheless,foranotherexpansionschemepresentedin Section5.4.7,weareabletoavoidredundanciesandtooptimizetheplacementof functions,butthealgorithmisrathercomplex. removesredundantcomputations,butcomputationisnotmadewith-structuresinsa ThemethodweproposeherehasbeenstudiedwiththehelpofLaurentVibert.It
167 Abstract-ML-SA(program;W;ml) 166program:anintermediaterepresentationoftheprogram CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION 1foreachdatastructureDinprogram returnsanintermediaterepresentationoftheexpandedprogram ml:reachingdenitionsofmemorylocations W:aconservativeapproximationofthesetofwriteaccesses 2dodeclareadatastructureDexp[W] 6543 doref foreachreferencereftodinprogram doleft-handsideofs foreachstatementsassigningdinprogram elseif(ml((curins;ref);fe(curins;ref))==f{g)dexp[{] if(ml((curins;ref);fe(curins;ref))=f?g)ref Dexp[CurIns] Loop-Nests-ML-SA(program;ml) 7returnprogram program:anintermediaterepresentationoftheprogram else(ml((curins;ref);fe(curins;ref))) ml:reachingdenitionsofmemorylocations 32doforeachstatementSassigningAinprogram 1foreacharrayAinprogram returnsanintermediaterepresentationoftheexpandedprogram 654 dodeclareanarrayas 7 doml foreachreferencereftoainprogram u left-handsideofs =ref symbolicaccessassociatedwithreferenceref ml\(iref) AS[Iter(CurIns)] quast map Convert-Quast(quast;ref) Make-Quast(ml 11returnprogram ref map(curins) =ref(u;fe(u))) form:itisbasedon@-structureswhosesemanticsissimilarto@-arraysinthestatic to-onemappingwiththeoriginaldatastructuresinsteadoftheexpandedones.notice andassociativityofthelexicographicmaximum.theideaistouse@-structuresinonependenceremovalandecientcomputationoffunctions,basedonthecommutativity single-assignment(ssa)framework[ks98].thisisasimplecompromisebetweende- instancesinsteadofmemorylocations,seeabstract-implement-phi-not-sa. inacriticalsection.boththewriteinstanceandthememorylocationshouldbestored, originalprogramsemantics.spuriousanti-dependencesremain,andmustbetakeninto turestotheir@-structures:theyhavenotdisappeared!however,thankstotheproperties ofthelexicographicmaximum,outputdependencescanbeignoredwithoutviolatingthe Theoriginalmemory-baseddependencesaredisplacedfromtheoriginaldatastruc- accountforparallelizationpurposes.therstexampleinfigure5.5canbeparallelized Itissucient,forexample,toparallelizethesecondexampleinFigure5.5.Considera withthistechnique,butnotthesecond. calloftheform(set).ifthecomponentvalueofsomedimensionsisconstantforall Inthecaseofloopnestsandarrays,asimpleextensiontothetechniquecanbehelpful.
168 5.1.MOTIVATIONSANDTRADEOFFS Abstract-Implement-Phi-Not-SA(expanded) expanded:anintermediaterepresentationoftheexpandedprogram 167 1foreachoriginaldatastructureD[shape]inexpanded 2doiftherearefunctionsaccessingDexp 3returnsanintermediaterepresentationwithrun-timerestorationcode 4 thendeclareadatastructure@d[shape]initializedto? 765 dosub foreachreadreferencereftodwhoseexpandedformis(set) dosubs foreachstatementsinvolvedinset subscriptofreferenceref (set) ifnotalreadydonefors thenfollowingsinsert@d[subs]=max(@d[subs],curins) if(@d[sub]!=?)dexp[@d]elsed[sub] subscriptofthewritereferencetodins iterationvectorsofinstancesinset,thenitislegaltoexpandthe@-arrayalongthese dimensions.appliedtothesecondexampleinfigure5.5,@xisreplacedby@x[i],which 11returnexpanded makestheouterloopparallel.... doublex; SR=x; for(i=1;i<=n;i++) doublex; Figure5.5.a.Firstexample if()x=; STfor(i=1;i<=N;i++){ R for(j=1;j<=n;j++) x=; if()x=x; Sdoublex,xS[N+1],@x= 1; Figure5.5.c.Secondexample } =x; parallelfor(i=1;i<=n;i++) } if(){ R=if(@x!= x=; Tdoublex,xT[N+1],xS[N+1,N+1]; double@x=( 1; 1); for(i=1;i<=n;i++){ xt[i]=; for(j=1;j<=n;j++) Figure5.5.b.Firstexample: parallelexpansion elsex; S if(){ xs[i,j]=if(j>1)xs[i,j-1] R =if(@x!=( 1; 1))xS[@x] }@x=max(@x,(i,j)); elsext[i]; Figure5.5.d.Secondexample: } elsext[i];...figure5.5.parallelismextractionversusrun-timeoverhead... notparallelizableexpansion Inpractice,thistechniqueisbothveryeasytoimplementandveryecientforrun-
169 timerestorationofthedataow,butitcanoftenhamperparallelismextraction.itisa 168 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION rstandsimpleattempttondatradeobetweenparallelismandoverhead. restorationofthedataowsharethesamemajordrawback:run-timeoverhead.by Allthesingle-assignmentformalgorithmsdescribedandmosttechniquesforrun-time TradeobetweenParallelismandOverhead Moreover,somefunctionscannotbeimplementedecientlywiththeoptimizations essence,saformrequiresahugememoryusage,andisnotpracticalforrealprograms. restorationcodeshouldbehandledwithcare.thisisthepurposeofthethreefollowing sections. designmorepragmaticexpansionschemes:bothmemoryusageandrun-timedata-ow proposed.toavoidorreducethesesourcesofrun-timeoverhead,itisthusnecessaryto 5.2 stickwiththecompile-timerestorationoftheowofdatawhilekeepinginmindthe Thepresentsectionstudiesanovelmemoryexpansionparadigm:itsmotivationisto MaximalStaticExpansion function(associatedwithrun-timerestorationofthedata-ow).wewillshowthatthis toremoveasmanymemory-baseddependencesaspossible,withouttheneedofany goalrequiresachangeinthewayexpandeddatastructuresareaccessed,totakeinto approximativenatureofthecompile-timeinformation.moreprecisely,wewouldlike expansion[bcc98,bcc00].4thegoalistondautomaticallyastaticwaytoexpand alldatastructuresasmuchaspossible,i.e.themaximalstaticexpansion.maximalstatic accounttheapproximativeknowledgeofstoragemappings. expansionmaybeconsideredasatrade-obetweenparallelismandmemoryusage. Anexpansionofdatastructuresthatdoesnotneedafunctioniscalledastatic hapsconservative)outputofareachingdenitionanalysis,soourmethodis\optimal" program,withoutrestriction theonlyrestrictionsbeingthoseofyourfavoritereaching withrespecttotheprecisionofthisanalysis.ourframeworkisvalidforanyimperative Wepresentanalgorithmtoderivethemaximalstaticexpansion;itsinputisthe(per- denitionanalysis.wethenpresentanintra-proceduralalgorithmtoconstructthemaximalstaticexpansionforprogramswitharraysandscalarsonly,butwheresubscriptsand controlstructuresareunrestricted Thethreefollowingexamplesintroducethemainissuesandadvocateforamaximalstatic expansiontechnique. Motivation lutioncodes5.partsdenotedbyaresupposedtohavenoside-eect. Werststudythepseudo-codeshowninFigure5.6;thiskernelappearsinseveralconvo- FirstExample:DynamicControlFlow [CFR+91,KS98]maynotbestatic. lution. 4Noticethataccordingtoourdenition,anexpansioninthestaticsingle-assignmentframework 5Forinstance,HornandSchunck'salgorithmtoperform3DGaussiansmoothingbyseparableconvo-
170 5.2.MAXIMALSTATICEXPANSION Tfor(i=1;i<=N;i++){ doublex; SR x=; } =x; while() x=x;...figure5.6.firstexample... samei).therefore,iftheexpansionassignsdistinctmemorylocationstoht;iiandto denedeitherbyt,orbysomeinstanceofs,inthesameiterationoftheforloop(the anundenednumberoftimes(possiblyzero).thevaluereadinxbystatementristhus EachinstancehT;iiassignsanewvaluetovariablex.Inturn,statementSassignsx instancesofhs;i;wi,6howcouldinstancehr;ii\know"whichmemorylocationtoread from? AppliedtotheexampleinFigure5.6,ittellsusthattheset(hS;i;wi)ofdenitions thuscallthemappingfromareadinstancetoitssetofpossiblereachingdenitions. tionanalysiswhichdescribewherevaluesaredenedandwheretheyareused.wemay Wehavealreadyseenthatthisproblemissolvedwithaninstancewisereachingdeni- reachinginstancehs;i;wiis: Andtheset(hR;ii)ofdenitionsreachinginstancehR;iiis: (hs;i;wi)=ifw>1thenfhs;i;w 1igelsefhT;iig (5.1) wherewisanarticialcounterofthewhile-loop. Letustrytoexpandscalarx.OnewayistoconverttheprogramintoSA,makingT (hr;ii)=ht;ii [hs;i;wi:w1; (5.2) likenow?abrute-forceapplicationof(5.2)yieldstheprograminfigure5.7.whilethe once,complyingwiththedenitionofsa.however,whatshouldright-handsideslook right-handsideofsonlydependsonw,theright-handsideofrdependsonthecontrol writeintoxt[i]andsintoxs[i;w]:then,eachmemorylocationisassignedtoatmost ow,thusneedingafunction. dependenciesbetweendistinctiterationsoftheforloop.figure5.8showstheresulting butwithouthavingtoinsertfunctions. Theaimofmaximalstaticexpansionistoexpandxasmuchaspossibleinthisprogram maximalstaticexpansionofthisexample.ithasthesamedegreeofparallelismandis simplerthantheprograminsingle-assignment. Apossiblestaticexpansionistouniformlyexpandxintox[i]andtoavoidoutput illustratedinthefollowingexamples. etal.[mal93]tohandletheprograminfigure5.6;thiswouldtellusthatxcanbe privatizedalongi.however,wewanttodomorethanprivatizationalongloops,as NoticethatitshouldbeeasytoadaptthearrayprivatizationtechniquesbyMaydan 6Weneedavirtualloopvariablewtotrackiterationsofthewhileloop.
171 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION Tfor(i=1;i<=N;i++){ S w=1; while(){ xt[i]= R } =(fht;iig[fhs;i;wi:w1g) w++; xs[i,w]=if(w==1)xt[i]elsexs[i,w-1]...figure5.7.firstexample,continued... }... ST for(i=1;i<=n;i++){ R while() x[i]= } =x[i] x[i]=x[i] SecondExample:ArrayExpansion...Figure5.8.Expandedversionoftherstexample... Letusgiveamorecomplexexample;wewouldliketoexpandarrayAintheprogramin Figure5.9. relationsbetweensinstances:anarrowfrom(i0;j0)to(i;j)meansthatinstance(i0;j0) denedbyaninstancehs;i0;j0iofswithj0n.figure5.9describesthedata-ow denesavaluethatmayreach(i;j). SinceTalwaysexecuteswhenjequalsN,avaluereadbyhS;i;ji,j>Nisnever... doublea[4*n]; Sfor(i=1;i<=2*N;i++) for(j=1;j<=2*n;j++){ if() 2Nj T } if(j==n)a[i+n]=; A[i-j+2*N]=A[i-j+2*N]; N...Figure5.9.Secondexample... N 2Ni
172 5.2.MAXIMALSTATICEXPANSION Formally,thedenitionreachinganinstanceofstatementSis:7 171 (hs;i;ji)=elsehs;i0;j0i:1i02n^n<j0<j^i0 j0=i j thenhs;i0;j0i:1i02n^1j0<j^i0 j0=i j ifjn [ht;i0;ni:1i0<i^i0=i j+n (5.3) Becausereachingdenitionsarenon-singletonsets,convertingthisprogramtoSAform wouldrequirerun-timecomputationofthememorylocationreadbys. 2N... j 2Nj N N samedataow Figure5.10.a.Instancesinvolvedinthe N 2Ni location Figure5.10.b.Countinggroupspermemory N 2Ni bygroupingtogetherinstancesinvolvedinthesamedataow.thesesubsetsbuilda...figure5.10.partitionoftheiterationdomain(n=4)... willnotbewrittennorreadbyinstancesoutsidethesubset.thepartitionisgivenin partitionoftheiterationdomain.eachsubsetmayhaveitsownmemoryspacethat However,wenoticethattheiterationdomainofSmaybesplitintodisjointsubsets accessedbyinstancesinthelargecentralsetinfigure5.10.b.letuslabelwith1the distinctsubsets.theseareallthearrayelementsa[c],1+nc3n 1.Theyare Figure5.10.a. subsetsinthelowerhalfofthisarea,andwith2thesubsetsinthetophalf.weaddone Usingthisproperty,wecanduplicateonlythoseelementsofAthatappearintwo dimensiontoarraya,subscriptedwith1and2instatementss2ands3infigure5.11, respectively.elementsa[c],1cnareaccessedbyinstancesintheupperlefttriangle infigure5.10.bandhaveonlyonesubseteach(onesubsetinthecorrespondingdiagonal toinstancesinthelowerrighttriangle. infigure5.10.a),whichwelabelwith1.thesamelabelingholdsforsetscorresponding therun-timeoverhead. thesamedegreeofparallelismasthecorrespondingsingle-assignmentprogram,without ThemaximalstaticexpansionisshowninFigure5.11.Noticethatthisprogramhas
173 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION doublea[4*n,2]; for(i=1;i<=2*n;i++) for(j=1;j<=2*n;j++){ S1 if(-2*n+1<=i-j&&i-j<=-n){ //expansionofstatements }elseif(-n+1<=i-j&&i-j<=n-1){ if() if(j<=n){ A[i-j+2*N,0]=A[i-j+2*N,1]; S2 if() S3 }else if() A[i-j+2*N,0]=A[i-j+2*N,0]; S4 }else if() A[i-j+2*N,0]=A[i-j+2*N,0]; A[i-j+2*N,1]=A[i-j+2*N,1]; T } if(j==n)a[i+n,2]=; //expansionofstatementt......figure5.11.maximalstaticexpansionforthesecondexample... TdoubleA[N+1]; for(i=1;i<=n;i++){ RS for(j=1;j<=n;j++) } =A[bar(i)]; A[foo(i)]=; A[j]=; TdoubleA[N+1,N+1]; Sfor(i=1;i<=N;i++){ A[foo(i),i]=; for(j=1;j<=n;j++) A[j,i]=; Figure5.12.a.Sourceprogram Figure5.12.b.Expandedversion R} =A[bar(i),i]; ThirdExample:Non-AneArraySubscripts...Figure5.12.Thirdexample... ConsidertheprograminFigure5.12.a,wherefooandbararearbitrarysubscripting functions8.sinceallarrayelementsareassignedbyt,thevaluereadbyrattheith arraytotheexpandedone. Asaconsequence,theexpandedprograminFigure5.11shoudbeginwithacopy-incodefromtheoriginal iterationmusthavebeenproducedbysortatthesameiteration.thedata-owgraph 8A[foo(i)]standsforanarraysubscriptbetween1andN,\toocomplex"tobeanalyzedatcompiletime. 7SomeinstancesofSreaduninitializedvalues(e.g.whenj=1)andtheyhavenoreachingdenition.
174 5.2.MAXIMALSTATICEXPANSION issimilartotherstexample: 173 ThemaximalstaticexpansionaddsanewdimensiontoAsubscriptedbyi.Itissucient tomaketherstloopparallel. (hr;ii)=hs;ii [ht;i;ji:1jn: (5.4) WhatNext? Theseexamplesshowtheneedforanautomaticstaticexpansiontechnique.Wepresent programsintheirexpandedcounterparts,whichisaconvenientpropertyofouralgorithm. inthefollowingsectionaformaldenitionofexpansionandageneralframeworkfor maximalstaticexpansion.wethendescribeanexpansionalgorithmforarraysthat yieldstheexpandedprogramsshownabove.noticethatitiseasytorecognizetheoriginal assignmentformtransformation.however,privatizationgenerallyresortstodynamic restorationofthedataow,anditonlydetectsparallelismalongtheenclosingloops; staticexpansion:bothmethodsexposeparallelisminprogramsatalowercostthansingle- Itisnaturaltocomparearrayprivatization[MAL93,TP93,Cre96,Li92]andmaximal itisthuslesspowerfulthangeneralarrayexpansiontechniques.indeed,theexamplein spacebutmayalsodosome\blocking"alongthesediagonals. Section5.2.1showsthatourmethodnotonlymayexpandalongdiagonalsintheiteration Weassumeaninstancewisereachingdenitionanalysisisperformedpreviously,yielding aconservativeapproximationoftherelationbetweenusesandreachingdenitions ProblemStatement samememorylocation.ifweassigntwodistinctmemorylocationstovandwinthe tothesamesetofreachingdenitionsofsomereadu.supposetheybothwriteinthe avoiddynamicrestorationofthedataow.letusconsidertwowritesvandwbelonging Thedenitionofstaticexpansionhasrstbeenintroducedin[BCC98]:theideaisto denotedbyfe(v)=fe(w),and\uandwareassigneddistinctmemorylocationsinthe introducedinsections2.4and2.5,\vandwwriteinthesamememorylocation"is notknowwhichofthetwolocationshasthevalueneededbyu.usingthenotations expandedprogram,thenafunctionisneededtorestorethedataow,sincewedo expandedprogram"isdenotedbyfexp thatwedonotrequirethereachingdenitionanalysistogiveexactresults): WeintroducerelationRbetweendenitionsthatpossiblyreachthesameread(recall e(v)6=fexp e(w). cationintheoriginalprogram,theymuststillassignthesamememorylocationinthe Whenevertwodenitionspossiblyreachingthesamereadassignthesamememorylo- 8v;w2W:vRw()9u2R:vu^wu: details).relationr,therefore,generalizeswebs[muc97]toinstancesofreferences,and therestofthisworkshowshowtocomputerinthepresenceofarrays.9 expandedprogram.since\writinginthesamememorylocation"isanequivalencerelation,weactuallyuser,thetransitiveclosureofr(seesection5.2.4forcomputation 9Strictlyspeaking,websincludedenitionsanduses,whereasRappliestodenitionsonly.
175 174RelationRholdsbetweendenitionsthatreachthesameuse.Therefore,mapping CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION Denition5.1(staticexpansion)Foranexecutione2Eoftheprogram,anexpansionfromstoragemappingfetostoragemappingfexp 8v;w2We:vRw^fe(v)=fe(w)=)fexp ee(v)=fexp isstaticif benecessary,acaseastaticexpansionisdesignedtoavoid: thesewritestodierentmemorylocationsispreciselythecasewherefunctionswould pansionfromfetofexp Whenclearfromthecontext,wesay\staticexpansionfexp e".now,weareinterestedinremovingasmanydependencesas e"insteadof\staticex- e(w): (5.5) Denition5.2(maximalstaticexpansion)Foranexecutione,astaticexpansion (MSE),assigningthelargestnumberofmemorylocationswhileverifying(5.5): possible,withoutintroducingfunctions.wearelookingforthemaximalstaticexpansion Intuitively,iffexp fexp e ismaximalonthesetweofwrites,ifforanystaticexpansionf0e, memorylocationwhenfexp e8v;w2we:fexp ismaximal,thenf0ecannotdobetter:itmapstwowritestothesame e does. e(v)=fexp e(w)=)f0e(v)=f0e(w): (5.6) expansionfexp tellushowmucheachindividualmemorylocationshouldbeexpanded.thepurposeof Weneedtocharacterizethesetsofstatementinstancesonwhichamaximalstatic Section5.2.3istodesignapracticalexpansionalgorithmforeachmemorylocationused e(v)g.however,thishardlygivesusanexpansionscheme,becausethisresultdoesnot e isconstant,i.e.equivalenceclassesofrelationfu;v2we:fexp e(u)= intheoriginalprogram. Followingthelinesof[BCC00],weareinterestedinthestaticexpansionwhichremoves thelargestnumberofdependences FormalSolution Proposition5.1(maximalstaticexpansion)Givenaprogramexecutione,astoragemappingfexp onlyif 8v;w2We: e isbothamaximalstaticexpansionoffeandnerthanfeifand vrw^fe(v)=fe(w)()fexp e(v)=fexp Letfexp Proof:Sucientcondition the\if"part e(w) (5.7) Bydenition,fexp Letusshowthatfexp e beamappings.t.8u;v2w:fexp e isastaticexpansionandfexp e ismaximal.supposethatforu;v2w:fexp e(u)=fexp e isnerthanfe. e(v),urv^fe(u)=fe(v): f0esatisesf0e(u)=f0e(v)too.hence,fexp (5.7)impliesuRvandfe(u)=fe(v).Thus,from(5.5),anyotherstaticexpansion maximal. e(u)=fexp e(v))f0e(u)=f0e(v),sofexp e(u)=fexp e(v). Necessarycondition the\onlyif"part e is Letfexp sion,weonlyhavetoprovethat e beamaximalstaticexpansionnerthanfe.becausefexp 8u;v2W:fexp e(u)=fexp e(v)=)urv^fe(u)=fe(v): e isastaticexpan-
176 5.2.MAXIMALSTATICEXPANSION Ontheonehand,fexp theotherhand,forsomeuandvinw,assumefexp e(u)=fexp e(v))fe(u)=fe(v)becausefeisnerthanfe.on e(u)=fexp 175 showthatitcontradictsthemaximalityoffexp when:(urw),andf0e(w)=cwhenurw,forsomec6=fexp expansion:byconstruction,f0e(u0)=f0e(v0)foranyu0andv0suchthatu0rv0.the e:foranywinw,letf0e(w)=fexp e(v)and:(urv).we contradictioncomesfromthefactthatf0e(u)6=f0e(v). e(u).f0eisastatic e(w) fromscratchisanotherissue.toseewhy,consideramemorylocationcandtwoaccessesv Resultsabovemakeuseofageneralmemoryexpansionfexp e.however,constructingit andwwritingintoc.assumethatvrw:theseaccessesmustassignthesamememory locationintheexpandedprogram.nowassumethecontrary:if:(vrw),thenthe R.Notation(fe;)ismerelyabstract.Aconcretemethodforcodegenerationinvolves expansionshouldmakethemassigntwodistinctmemorylocations. addingdimensionstoarrays,andextendingarraysubscriptswith,seesection functionisconstructedbytheanalysisandmustbeconstantonequivalenceclassesof Wearethusstronglyencouragedtochooseanexpansionfexp eoftheform(fe;)where maximalstaticexpansioniffunctionsatisesthefollowingequation: Now,astoragemappingfexp 8e2E;8v;w2We;fe(v)=fe(w):vRw()(v)=(w): e =(fe;)isnerthanfebyconstruction,anditisa approximatefewithrelationandderivetwoconstraintsfromthepreviousequation: Inpractice,fe(v)=fe(w)canonlybedecidedwhenfeisane.Ingeneral,wehaveto Expansionmustbemaximal:8v;w2W:vw^:(vRw)=)(v)6=(w):(5.9) First,noticethatchangingintoitstransitiveclosurehasnoimpacton(5.8),and Expansionmustbestatic:8v;w2W: vw^vrw=)(v)=(w); (5.8) orsection5.4),buttheyseemmuchtwocomplicatedforourpurpose. thatthetransformedequationyieldsanequivalenceclassenumerationproblem.second, related.directmethodsexiststoaddressthesetwoproblemssimultaneously(see[coh99b] (5.9)isagraphcoloringproblem:itsaysthattwowritescannot\sharethesamecolor"if nottransitive onlyincontrivedexamples,e.g.withtrickycombinationsofaneand usingaconservativeapproximationharmsneitherthemaximalitynotthestaticpropertyoftheexpansion.actually,wefoundthatrelationdiersfrom meaningis Now,theonlypurposeofrelationistoavoidunnecessarymemoryallocation,and criterion: non-anearraysubscripts.therefore,considerthefollowingmaximalstaticexpansion Now,givenanequivalenceclassof,classesofRareexactlythesetswherestorage mappingfexp e isconstant: 8v;w2W;vw:vRw()(v)=(w) (5.10) Theorem5.1Astoragemappingfexp incrandtakesdistinctvaluesbetweendierentclasses:8v;w2c:vrw, (v)=(w). executione2eiforeachequivalenceclassc2w,isconstantoneachclass e =(fe;)isamaximalstaticexpansionforall applicationof(5.10)concludestheproof. CRisthesetofequivalenceclassesforrelationRonwritesinC.Astraightforward Proof:C2Wdenotesasetofwriteswhichmayassignthesamememorycell,and
177 176 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION Noticethatisonlysupposedtotakedierentvaluesbetweenclassesinthesame C:ifC1;C22WwithC16=C2,u12C1andu22C2,nothingpreventsthat (u1)=(u2). Asaconsequence,twomaximalstaticexpansionsfexp eandf0eareidenticalonaclassof W,uptoaone-to-onemappingbetweenconstantvalues.Aninterestingresultfollows: Lemma5.1TheexpansionfactorforeachmemorylocationassignedbywritesinCis Card(CR). LetCbeanequivalenceclassinW(statementinstancesthatmayhitthesame memorylocation).supposewehaveafunctionmappingeachwriteuinctoarepresentativeofitsequivalenceclassinc(seesection5.2.4fordetails).onemaylabel eachclassincr,orequivalently,labeleachelementof(c).suchalabelingschemeis obviouslyarbitrary,butallprogramstransformedusingourmethodareequivalentupto apermutationoftheselabels.labelingboilsdowntoscanningexactlyoncealltheinteger pointsinthesetofrepresentatives(c),seesection5.2.5fordetails.now,remember thatfunctionfexp e isoftheform(fe;).fromtheorem5.1,wecantakefor(u)the labelwechoosefor(u),thenstoragemappingfexp e isamaximalstaticexpansionforour program. Eventually,onehastogeneratecodefortheexpandedprogram,usingstoragemapping fexp e.itisdoneinsection Algorithm Themaximalstaticexpansionschemegivenaboveworksforanyimperativeprogram. Moreprecisely,youmayexpandanyimperativeprogramusingmaximalstaticexpansion, providedthatareachingdenitionanalysistechniquecanhandleit(attheinstancelevel) andthattransitiveclosurecomputation,relationcomposition,intersectionandunionare feasibleinyourframework. Inthesequel,sinceweuseFADA(see[BCF97,Bar98]andSection2.4.3)asreaching denitionanalysis,weinherititssyntacticalrestrictions:datastructuresarescalarsand arrays;pointersarenotallowed.loops,conditionalsandarraysubscriptsareunrestricted. Therefore,Maximal-Static-ExpansionandMSE-Convert-Quastarebasedonthe classicalsingle-assignmentalgorithmsforloopnests,seesection5.1.theyrelyonomega [KPRS96]andPIP[Fea88b]forsymboliccomputations.AdditionalalgorithmsandtechnicalpointsarestudiedinSection5.2.5.InMaximal-Static-Expansion,thefunction mappinginstancestotheirrepresentativedisencodedasananerelationbetweeniterationvectors(augmentedwiththestatementlabel),andlabelingfunctionisencoded asananerelationbetweenthesameiterationvectorsanda\compressed"vectorspace foundbyenumerate-representatives,seesection Aninterestingbuttechnicalremarkisthat,byconstructionoffunction seenasa parameterizedvector,afewcomponentsmaytakeanite andhopefullysmall number ofvalues.indeed,suchcomponentsmayrepresentthe\statementpart"ofaninstance. Insuchcase,splittingarrayAintoseveral(renamed)datastructures10shouldimprove performanceanddecreasememoryusage(avoidingconvexhullsofdisjointpolyhedra). ConsiderforinstanceMSEofthesecondexample:expandingAintoA1andA2would require6n 2arrayelementsinsteadof8N 2inFigure5.11.Othertechniquesreducing 10Recallthatinsingle-assignmentform,statementsassigndisjoint(renamed)datastructures.
178 5.2.MAXIMALSTATICEXPANSION Maximal-Static-Expansion(program;;) program:anintermediaterepresentationoftheprogram 177 :theconictrelation 2R 1 :thereachingdenitionrelation,seenasafunction 3 returnsanintermediaterepresentationoftheexpandedprogram 4 Compute-Representatives(\R) Transitive-Closure() 5foreacharrayAinprogram Enumerate-Representatives(;) Transitive-Closure( 1) 6doA 978 doleft-handsidea[subscript]ofsisreplacedbyaexp[subscript;(curins)] declarationa[shape]isreplacedbyaexp[shape,a] foreachstatementsassigningainprogram component-wisemaximumof(u)forallwriteaccessesutoa do=ref foreachreadreferencereftoainprogram quast restrictionoftoaccessesoftheform({;ref) 16returnprogram map ref map(curins) MSE-Convert-Quast(quast;ref) Make-Quast(=ref) MSE-Convert-Quast(quast;ref) quast:thequastrepresentationofthereachingdenitionfunction ref:theoriginalreference 31switch 2returnstheimplementationofquastasavalueretrievalcodeforreferenceref casequast=f?g: 654 casequast=f{g: A returnref 78 S x Iter({) Stmt({) Array({) casequast=f{1;{2;:::g: returnaexp[subscript;x] error\thiscaseshouldneverhappenwithstaticexpansion!" originalarraysubscriptinref casequast=ifpredicatethenquast1elsequast2: returnifpredicatemse-convert-quast(quast1;ref) thenumberofuselessmemorylocationsallocatedbyouralgorithmarenotdescribedin elsemse-convert-quast(quast2;ref) thispaper Thissectionisdevotedtotheiranalysisandresolution. Afewtechnicalpointsandcomputationalissuesareraisedinthepreviousalgorithm. DetailedReviewoftheAlgorithm
179 FindingRepresentativesforEquivalenceClasses 178 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION rstexperimentsgavegoodresults. lexicographicminimumbecauseitcanbecomputedusingclassicaltechniques,andour Findinga\good"canonicalrepresentativeinasetisnotasimplematter.Wechoosethe Therefore,thegood\parametric"propertiesoflexicographicalminimumcomputations [Fea91,Pug92]arewellsuitedtoourpurpose. Noticealsothatrepresentativesmustbedescribedbyafunctiononwriteinstances. Cis: equivalencerelation,andcanequivalenceclassfor.thelexicographicalminimumof Ageneraltechniquetocomputethelexicographicalminimumfollows.Letbean Since<lexisarelation,wecanrewritethedenitionusingalgebraicoperations: <lex(c) ThisisappliedinourframeworktoclassesofRandwithorder<seq. min <lex(c) = n(<lex)(c): (5.11) Compute-Representatives(equivalence) 1repres 2returnrepres equivalence:ananeequivalencerelationoverinstances returnsananefunctionmappinginstancestoacanonicalrepresentative ApplyingAlgorithmCompute-RepresentativestorelationRyieldsanane equivalencen(<seqequivalence) function,butthisdoesnotreadilyprovidethelabelingfunction.thelaststep consistsinenumeratingtheimageofinsideclassesofequivalencerelation. Tolabeleachmemorylocation,weassociateeachlocationtoanintegerpointintheane polyhedronofrepresentatives,i.e.theimageoffunctionwhoserangeisrestrictedto ComputingaDenseLabeling aclassofequivalencerelation.labelingboilsdowntoscanningexactlyonceallthe integerpointsinthesetofrepresentatives.thiscanbedoneusingclassicalpolyhedronscanningtechniques[ai91,cfr95]orsimplybyconsideringa\part"oftherepresentative shouldbeasdenseaspossible,meaningthatthenumberofmemorylocationsaccessed.butcomputinga\good"labelingfunctionismuchmoredicult:a\good"labeling functioninone-to-onemappingwiththisset.itisthuseasytocomputealabelingfunction bytheprogrammustbeasnearaspossibleasthenumberofmemorylocationsallocated fromtheshapeoffunction. timesgenerateveryintricatesubscripts;moreover,mostcompile-timepropertieson,thankstoerhartpolynomials[cla96],andtobuildalabeling(non-aneingeneral) fromthiscomputation.butthiswouldbeextremelycostlyinpracticeandwouldsome- Apossibleideawouldbetocountthenumberofintegerpointsintheimageoffunction caseisleftforfuturework. andrajopahye[wr93],butstudyingapplicabilityoftheirtechniquetoourmoregeneral lem"ismostlyopenatthemoment.wehavefoundaninterestingpartialresultbywilde wouldbelost,duetothepossiblenon-aneform.asaresult,the\denselabelingprob-
180 5.2.MAXIMALSTATICEXPANSION Manysimpletransformationscanbeappliedtotocompressitsimage.Thanks 179 translation,divisionbyanintegerconstant whenaconstantstrideisdiscovered and totheregularityofiterationspacesofpracticalloopnests,techniquessuchasglobal afunctionwhoserangeisrestrictedtoaclassofsomeequivalencerelation. Representativesimplementsthesesimpletransformationstoenumeratetheimageof projectiongaveexcellentresultsoneveryexamplewestudied.algorithmenumerate- Enumerate-Representatives(rel;fun) rel:equivalencerelationwhoseclassesdeneenumerationdomains fun:theanefunctionwhoseimageshouldbeenumerated 3applyappropriatetranslations,divisionsandprojectionstoiterationvectorsinenum 2enum 1repres returnsadenselabelingoftheimageoffunrestrictedtoaclassofrel 4returnenum Symbolic-Vector-Subtract(fun;represfun) Compute-Representatives(rel) WhataboutComplexityandPracticalUse? Foreacharrayinthesourceprogram,thealgorithmproceedsasfollows: Computethereciprocalrelation 1of.Thisisdierentfromcomputingthe Composingtworelationsand0boilsdowntoeliminatingyinxy^y0z. inverseofafunctionandconsistsonlyinaswapofthetwoargumentsof. ComputingtheexacttransitiveclosureofRorisimpossibleingeneral:Presburger approximations(ifnotexactresults)canbecomputed.kellyetal.[kprs96]donot arithmeticisnotclosedundertransitiveclosure.however,verypreciseconservative giveaformalboundonthecomplexityoftheiralgorithm,buttheirimplementation algorithmispresentedinsection3.1.2.noticeagainthattheexacttransitiveclosure intheomegatoolkitprovedtobeecientifnotconcise.ashortreviewoftheir isnotnecessaryforourexpansionschemetobecorrect. Moreover,Randhappenstobetransitiveinmostpracticalcases.Inourimplementation,theTransitive-Closurealgorithmrstcheckswhetherthedierence Inthealgorithmabove,isalexicographicalminimum.Theexpansionschemejust relationsrandarealreadytransitive. (RR)nRisempty,beforetriggeringthecomputation.Inallthreeexamples,both Finally,numberingclassesbecomescostlyonlywhenwehavetoscanapolyhedral icalminimumisexpensiveapriori,butwaseasytoimplement. needsawaytopickoneelementperequivalenceclass.computingthelexicograph- IsourResultMaximal? onourbenchmarkexamples. setofrepresentativesindimensiongreaterthan1.inpractice,weonlyhadintervals oftheoriginalprogramstoragemapping.wewouldliketostressthefactthatthe Ourexpansionschemedependsonthetransitiveclosurecalculator,andofcourseonthe accuracyofinputinformation:instancewisereachingdenitionsandapproximation
181 expansionproducedisstaticandmaximalwithrespecttotheresultsyieldedbythese 180 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION parts,whatevertheiraccuracy: Theexacttransitiveclosuremaynotbeavailable(forcomputabilityorcomplexityreasons)andmaythereforebeover-approximated.Theexpansionfactorofa memorylocationcisthenlowerthancard(fu2w:fe(u)=cgr).however,the tothealgorithm. expansionremainsstaticandismaximalwithrespecttothetransitiveclosuregiven Relationapproximatingthestoragemappingoftheoriginalprogrammaybe moreaccuratetherelation,thelessunusedmemoryisallocatedbytheexpanded pointdoesnotinterferewiththestaticityormaximalityoftheexpansion;butthe program. moreorlessprecise,butwerequiredittobepessimistic(a.k.a.conservative).this Despitegoodperformanceresultsonsmallkernels(seefollowingsections),itisobvious thatreachingdenitionanalysisandmsewillbecomeunacceptablyexpensiveonlarger ApplicationtoRealCodes optimizationtechniquesisvaluable.suchtechniqueshavebeeninvestigatedbyberthou gorithmindependentlytoseveralloopnests.aparallelizingcompiler(oraproler)can codes.whenaddressingrealprograms,itisthereforenecessarytoapplythemseal- in[ber93],andalsointhepolaris[bef+96]andsuif[h+96]projects. isolateloopneststhatarecriticalprogrampartsandwherespendingtimeinpowerful ofpossiblereachingdenitionsforsomereadaccessesisnotasingletonandincludes?, valuesthatmaybereadbysuchaccessesmustbecopiedintotheappropriateexpanded itisnecessarytoperformsomecopy-inatthebeginningofthecode.eacharrayholding However,somevaluesmaybeinitializedoutsideoftheanalyzedcode.Whentheset values.however,theprocessisfullyparallelandcanhopefullynotcostmorethanthe loopnestitself. arrays.inpracticethisisexpensivewhenexpandedarraysholdmanycopiesoforiginal thememorylocationaccessedisunknown).sinceall?shavebeenremoved,computing inthereachingdenitionrelationbytheappropriatevirtualaccess(accessesindeed,when Itconsistsinadding\virtualwriteaccesses"foreverymemorylocationandreplacing?s Thereisasimplewaytoavoidcopy-in,tothecostofsomelossintheexpansiondegree. themaximalstaticexpansionfromthismodiedreachingdenitionrelationrequiresno copy-in;butadditionalconstraintsduetothe\virtualaccesses"mayforbidsomearray inaloopnest.butitsapplicationtothesecondmotivatingexample(figure5.9)would expansions.thistechniqueisespeciallyusefulwhenmanytemporaryarraysareinvolved theaccessestothesameoriginalarraymaynowbeinconsistent.considerforinstancethe forbidallexpansionsincealmostallreadsmayaccessvaluesdenedoutsidethenest. bymse,andthesecondnestbyanytechnique.thecodeappearsinfigure5.13.b. originalpseudocodeinfigure5.13.a.weassumetherstnestwasprocessedseparately Moreover,thedatastructurescreatedbyMSEoneachloopnestmaybedierent,and whichtheoriginalstructureisrestored(seefigure5.13).doingthisonlyrequirestoadd, knowwhich1toreadfrom. Clearly,referencestoAmaybeinconsistent:areadreferenceinthesecondnestdoesnot attheendoftherstnest,\virtualaccesses"thatreadseverymemorylocationswritten Asimplesolutionisthentoinsert,betweenthetwoloopnests,acopy-outcodein
182 5.2.MAXIMALSTATICEXPANSION fori A[f1(i)] endfor fori =A[f2(i)] endfor Figure5.13.a.Originalcode fori A1[f1(i),1(i)] endfor fori =A1[f2(i),/*unknown*/] endfor Figure5.13.b.MSEversion fori A1[f1(i),1(i)] endfor forc//copy-outcode A[c]=A1[c,1(())] endfor fori =A[f2(i)] endfor Figure5.13.c.MSEwithcopy-out...Figure5.13.Insertingcopy-outcode... inthenest.thereachingdenitionswithinthenestgivetheidentityofthememory locationtoreadfrom.noticethatnofunctionsarenecessaryinthecopycode the oppositewouldleadtoanon-staticexpansion.moreprecisely,ifwecallv(c)the\virtual access"tomemorylocationcaftertheloopnest,wecancomputethemaximalstatic expansionforthenestandtheadditionalvirtualaccesses,andthevaluetocopybackinto cislocatedin(c;((v(c)))). Fortunately,withsomeknowledgeontheprogram-wideowofdata,severaloptimizationscanremovethecopy-outcode11.Thesimplestoptimizationistoremovethe copy-outcodeforsomedatastructurewhennoreadaccessexecutingafterthenestusesa valueproducedinsidethisnest.thecopy-outcodecanalsoberemovedwhennofunctionsareneededinreadaccessesexecutingafterthenest.eventually,itisalwayspossible toremovethecopy-outcodeinperformingaforwardsubstitutionof(c;((v(c))))into readaccessestoamemorylocationcinfollowingnests BacktotheExamples Thissectionappliesouralgorithmtothemotivatingexamples,usingtheOmegaCalculator[Pug92]asatooltomanipulateanerelations. 11Letusnoticethat,ifMSEisusedincodesign,theintermediatecopy-codeandassociateddata structureswouldcorrespondtoadditionallogicandbuers,respectively.bothshouldbeminimizedin complexityand/orsize.
183 FirstExample 182 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION ConsideragaintheprograminFigure5.6page169.UsingtheOmegaCalculatortextbasedinterface,wedescribeastep-by-stepexecutionoftheexpansionalgorithm.We havetocodeinstancesasinteger-valuedvectors.aninstancehss;iiisdenotedbyvector [i,..,s],where[..]possiblypadsthevectorwithzeroes.wenumbert;s;rwith1, 2,3inthisorder,sohT;ii,hS;i;jiandhR;iiarewritten[i,0,1],[i,j,2]and[i,0,3], respectively. S:={[i,1,2]->[i,0,1]:1<=i<=N} From(5.1)and(5.2),weconstructtherelationSofreachingdenitions: union{[i,w,2]->[i,w-1,2]:1<=i<=n&&2<=w} relatedtogether,andcanbeomitted. Sincewehaveonlyonememorylocation,relationtellsusthatallinstancesare union{[i,0,3]->[i,0,1]:1<=i<=n} ComputingRisstraightforward: union{[i,0,3]->[i,w,2]:1<=i<=n&&1<=w}; R; S':=inverseS; {[i,0,1]->[i,0,1]:1<=i<=n}union R:=S(S'); {[i,w,2]->[i,0,1]:1<=i<=n&&1<=w}union {[i,0,1]->[i,w',2]:1<=i<=n&&1<=w'}union {[i,w,2]->[i,w',2]:1<=i<=n&&1<=w'&&1<=w} Inmathematicalterms,weget: hs;i;wirhs;i;w0i()1in;w1;w01 hs;i;wirht;ii()1in^w1 ht;iirht;ii()1in RelationRisalreadytransitive,noclosurecomputationisnecessary: ht;iirhs;i;w0i()1in^w01 R=R (5.12) (theleastinstanceaccordingtothesequentialorder):(u)=min<seq(fu0:u0rug).we maycomputethisexpressionusing(5.11): Thereisonlyoneequivalenceclassfor. Letuschoose(u)astherstexecutedinstanceintheequivalenceclassofuforR accessestovariablexrequiresnmemorylocations.here,iisanobviouslabel: Computing(W)yieldsNinstancesoftheformhT;ii.Maximalstaticexpansionof 8i;w;1iN;w1:(hT;ii)=hT;ii;(hS;i;wi)=hT;ii: righthandsidearetransformedintox[i]toosincetheirreachingdenitionsareinstances Allleft-handsidereferencestoxaretransformedintox[i];allreferencestoxinthe 8i;w;1iN;w1:(hS;i;wi)=(hT;ii)=i: (5.13) ofsortforthesamei.theexpandedcodeisthusexactlytheonefoundintuitivelyin Figure5.8. Thesizedeclarationofthenewarrayisx[1..N].
184 5.2.MAXIMALSTATICEXPANSION SecondExample 183 by[i,j,1]and[i,n,2],respectively. WenowconsidertheprograminFigure5.9.InstanceshS;i;jiandhT;i;Niaredenoted S:={[i,j,1]->[i',j',1]:1<=i,i'<=2N&&1<=j'<j<=N&&i'-j'=i-j} From(5.3),therelationSofreachingdenitionsisdenedas: union{[i,j,1]->[i',j',1]:1<=i,i'<=2n&&n<j'<j<=2n&&i'-j'=i-j} SorT,whoseiterationvectorsare(i;j)and(i0;j0)writeinthesamememorylocation Itiseasytocomputerelationsinceallarraysubscriptsareane:twoinstancesof union{[i,j,1]->[i',n,2]:1<=i,i'<=2n&&n<j<=2n&&i'=i-j+n}; syntax: May:={[i,j,s]->[i',j',s']:1<=i,j,i',j'<=2N&&i-j=i'-j'&& ii j=i0 j0.thisrelationistransitive,hence=.wecallitmayinomega's S':=inverseS; Asintherstexample,wecomputerelationRusingOmega: (s=1 (s=2&&j=n) s'=1 (s'=2&&j'=n))}; R:=S(S'); R; {[i,j,1]->[i',j-i+i',1]:1<=i<=2n-1&&1<=j<n&&1<=i'<=2n-1 {[i,j,1]->[i',j-i+i',1]:n<j<=2n-1&&1<=i<=2n-1&&1<=i'<=2n-1 {[i,n,2]->[i',n-i+i',1]:1<=i<i'<=2n-1&&i'<n+i}union &&i<j+i'&&j+i'<n+i}union {[i,j,1]->[n+i-j,n,2]:n<j<=2n-1&&i<=2n-1&&j<n+i}union {[i,n,2]->[i,n,2]:1<=i<=2n-1} &&N+i<j+i'&&j+i'<2N+i}union Thatis: ht;i;nirht;i;ni,1i2n 1 hs;i;jirhs;i0;j0i,(1i;i02n 1)^(i j=i0 j0) ht;i;nirhs;i0;n i+i0i,1i<i02n 1^i0<N+i hs;i;jirht;n+i j;ni,(1i2n 1)^(N<j2N 1)^(j<N+i) ^ 1j;j0<N_N<j;j0<2N 1 ofr. i j=kg[fht;k+n;nig.now,foru2c,(u)=min<seq(fu02w:u0u^u0rug). LetCbeanequivalenceclassforrelation.Thereisanintegerks.t.C=fhS;i;ji: RelationRisalreadytransitive:R=R.Figure5.10.ashowstheequivalenceclasses Then,wecompute(u)usingOmega: 1 Ni jn 1^j>=N:(hS;i;ji)=hT;i;Ni 1 Ni jn 1^j<N:(hS;i;ji)=hS;i j+1;1i 1 2Ni j N:(hS;i;ji)=hS;1;1 i+ji Ni j2n 1:(hS;i;ji)=hS;i j+1;1i 1i2N 1:(hT;i;Ni)=hT;i;Ni
185 184TheresultshowsthreeintervalsofconstantcardinalityofCR;theyaredescribedin CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION isonlyonerepresentative,thus(hs;i;ji)=1.if1 Ni jn 1,therearetwo Figure5.10.b.Alabelingcanbefoundmechanically.Ifi j Nori jn,there representatives;thenwedene(hs;i;ji)=1ifjn,(hs;i;ji)=2ifj>n,and (ht;i;ni)=2. tionalsinhavebeentakenoutofarraysubscripts. indeningtwodierentarrays:a1standingfora[;0]holding4n 1elements,and ThestaticexpansioncodeappearsinFigure5.11.AshintedinSection5.2.4,condi- A2standingforA[;1]holdingonly2N 1elements.Thisideawaspointedoutin Section ArrayAisallocatedasA[4*N,2].Notethatsomememorycouldhavebeenspared WecomebacktotheprograminFigure5.12.a.InstanceshT;i;ji,hS;iiandhR;iiare written[i,j,1],[i,0,2]and[i,0,3]. ThirdExample:Non-AneArraySubscripts S:={[i,0,3]->[i,j,1]:1<=i,j<=N} From(5.4),webuildtherelationofreachingdenitions: relationbetweeninstanceswritinginsomelocationa[x].wecanonlymakethefollowing Sincesomesubscriptsarenonane,wecannotcomputeatcompile-timetheexact union{[i,0,3]->[i,0,2]:1<=i<=n}; pessimisticapproximationof:allinstancesarerelatedtogether(becausetheymayassign thesamememorylocation). R; R:=S(S'); S':=inverseS; {[i,j,1]->[i,j',1]:1<=i<=n&&1<=j<=n {[i,0,2]->[i,0,2]:1<=i<=n} {[i,j,1]->[i,0,2]:1<=i<=n&&1<=j<=n}union {[i,0,2]->[i,j',1]:1<=i<=n&&1<=j'<=n}union Risalreadytransitive:R=R. Thereisonlyoneequivalenceclassfor. Wecompute(u)usingOmega: NotethateveryhT;i;jiinstanceisinrelationwithhT;i;1i. 8i;j;1iN;1jN:(hT;i;ji)=hT;i;1i 8i;1iN:(hS;ii)=hT;i;1i representatives;thustheresultingfunctionis: ofaccessestovariablexrequiresnmemorylocations.wecanuseitolabelthese Computing(W)yieldsNinstancesoftheformhT;ii.Maximalstaticexpansion (hs;ii)=(ht;i;ji)=i:
186 5.2.MAXIMALSTATICEXPANSION Usingthislabeling,alllefthandsidereferencestoA[]becomeA[,i]inthe 185 theintuitiveresultgiveninfigure5.12.b. i,therighthandsideofrisexpandedthesameway.expandingthecodethusleadsto expandedcode.sincethesourceofhr;iiisaninstanceofsortatthesameiteration ThesizedeclarationofAisnowA[N+1,N+1]. WeranafewexperimentsonanSGIOrigin2000,usingthemplibrary.Implementation issuesarediscussedinsection Experiments memberthatwisanarticialcounterofthewhile-loop,andmisthemaximumnumber Fortherstexample,theparallelSAandMSEprogramsaregiveninFigure5.14.Re- PerformanceResultsfortheFirstExample canbecomputedatlowcost:itrepresentsthelastiterationoftheinnerloop. ofiterationsofthisloop.wehaveseenthatafunctionisnecessaryforsaform,butit... TdoublexT[N],xS[N,M]; parallelfor(i=1;i<=n;i++){ S w=1; while(){ xt[i]=; } w++; xs[i][w]=if(w==1)xt[i]; doublex[n+1]; R //thelasttwolinesimplement =if(w==1)xt[i]; elsexs[i,w-1]; ST parallelfor(i=1;i<=n;i++) R while() x[i]=; =x[i]; x[i]=x[i]; Figure5.14.a.Single-assignment }//(fht;iig[fhs;i;wi:1wmg) Figure5.14.b.Maximalstaticexpansion} totheoriginalsequentialprogram,thenspeed-upsforthemseversionrelativetothe...figure5.14.parallelizationoftherstexample... single-assignmentform.asexpected,mseshowsabetterscaling,andtherelativespeedupquicklygoesover2.moreover,forlargermemorysizes,thesaprogrammayswapor TableinFigure5.15rstdescribesspeed-upsforthemaximalstaticexpansionrelative failforlackofmemory. ure5.16summarizesthecomputationtimesforourexamples(ona32mbsunsparc- station5).theseresultsdonotincludethecomputationtimesforreachingdenition ThemaximalstaticexpansionisimplementedinC++ontopoftheOmegalibrary.Fig- Implementation analysisandcodegeneration.
187 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION Conguration Speed-upsforMSEversusoriginalprogram MN 16processors 32processors processors 1.43Speed-upsforMSEversusSA processors Figure5.15.Experimentalresultsfortherstexample transitive closure1stexample2ndexample3rdexample pickingthe (check) representatives (function) other total Figure5.16.Computationtimes,inmilliseconds... minimum.theintuitionbehindtheseresultsisthatthecomputationtimemainlydepends onthenumberofaneconstraintsinthedata-owanalysisrelation. tocomputefunction(mappinginstancestotheirrepresentatives)usingalexicographical Moreover,computingtheclassrepresentativesisrelativelyfast;itvalidatesourchoice transitiveclosureswhentheybecomelarge. Ouronlyconcern,sofar,wouldbetondawaytoapproximatetheexpressionsof timeoverhead.parallelizationviamemoryexpansionthusrequiresbothmoderationin 5.3 Memoryexpansiontechniqueshavetwomaindrawbacks:highmemoryusageandrun- StorageMappingOptimization code. theexpansiondegreeandeciencyintherun-timecomputationofdata-owrestoration donotinterferewithparallelismextraction.thissectionaddressessuchoptimization constraints"suchastheonepresentedinsection5.2orwithoptimizationtechniquesthat Moderationintheexpansiondegreecanbeaddressedintwoways:eitherwith\hard
188 5.3.STORAGEMAPPINGOPTIMIZATION techniques,andpresentsthemainresultsofacollaborationwithvincentlefebvre.itcan 187 beseenasanextensionofaworkbyfeautrierandlefebvre[lf98]andalsobystroutet al.[scfs98]. ping,accordingtoagivenparallelexecutionorder,foranynestofloopswithunrestricted conditionalexpressionsandarraysubscripts;weshowthatschedule-independentstorage mappingsdenedin[scfs98]correspondtocorrectstoragemappingsaccordingtothe Ourcontributionsarethefollowing:weformalizethecorrectnessofastoragemap- dependencegraphs(i.e.capturedbypresburgerarithmetics). applicabletoanynestofloopsandtoallparallelizationtechniquesbasedonpolyhedral data-owexecutionorder;andwepresentanalgorithmforstoragemappingoptimization, FirstExample:DynamicControlFlow WerststudythekernelinFigure5.17.a,whichwasalreadytherstmotivatingexample Motivation insection5.2.partsdenotedbyhavenoside-eect.eachloopiterationspawns instancesofstatementsincludedintheloopbody.... Tfor(i=1;i<=N;i++){ doublex; S x=; while(){ doublext[n+1],xs[n+1,m+1] R} =x; } x=x; parallelfor(i=1;i<=n;i++){ ST xt[i]=; w=1; while(){ xs[i][w]=if(w=1)xt[i]; Figure5.17.a.Originalprogram R =if(w==1)xt[i]; } w++; elsexs[i,w-1]; }//thelasttwolinesimplement //(fht;iig[fhs;i;wi:1wmg) elsexs[i,w-1]; TdoublexTS[N+1] parallelfor(i=1;i<=n;i++){ xts[i]=; Figure5.17.b.Single-assignment SR while(){ } =xts[i]; } xts[i]=xts[i]; Figure5.17.c.Partialexpansion...Figure5.17.Convolutionexample... Anyinstancewisereachingdenitionanalysisissuitabletoourpurpose,butFADA
189 [BCF97]ispreferedsinceithandlesanyloopnestandachievestoday'sbestprecision. 188 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION resultsforreferencesxinright-handsideofrandsarenestedconditionals: Value-baseddependenceanalysis[Won95]isalsoagoodchoice.Inthefollowing,The (hs;i;w;xi)=ifw=1thenftgelsefhs;i;w 1ig tiling.weneedtoexpandscalarxandremoveasmanyoutput,owandanti-dependences Here,memory-baseddependenceshampersdirectparallelizationviaschedulingor (hr;i;xi)=fhs;i;wi:1wg: aspossible.reachingdenitionanalysisisatthecoreofsingle-assignment(sa)algorithms,sinceitrecordsthelocationofvaluesinexpandeddatastructures.howevertimerestorationofvalues[cfr+91,col98].figure5.17.bshowsourprogramconverted whentheowofdataisunknownatcompile-time,functionsareintroducedforrun- representsthelastiterationoftheinnerloop. oftheinnerloop).afunctionisnecessarybutcanbecomputedatlowcostsinceit tosaform,withtheouterloopmarkedparallel(misthemaximumnumberofiterations array.optimizingmemoryusageisthusacriticalpointwhenapplyingmemoryexpansion techniquestoparallelization. SAprogramssuerfromhighmemoryrequirements:SnowassignsahugeNM aone-dimensionalarrayissucientsincetheinnerloopisnotparallel.asaside-eect,no functionisneededanymore.storagerequirementisn,tobecomparedwithnm+nin beforetheinnerloopintheparallelversion,sandtmayassignthesamearray.moreover Figure5.17.cshowstheparallelprogramafterpartialexpansion.SinceTexecutes thesaversion,andwith1intheoriginalprogram(allowingnolegalparallelreordering). (a.k.a.universal)storagemapping,inthesenseof[scfs98].onmanyprograms,amore sincetheinnerloopcannotbeparallelized.wehavethusbuiltaschedule-independent However,itiseasytoshowthatitisalsocompatiblewithallotherexecutionorders, Thispartialexpansionhasbeendesignedforaparticularparallelexecutionorder. only. compatiblewithanylegalexecutionorder.thisisdonein[lf98]foraneloopnests toagivenparallelexecutionorder,insteadofndingaschedule-independentstorage memory-economicaltechniqueconsistsincomputingalegalstoragemappingaccording SecondExample:aMoreComplexParallelization WenowconsidertheprograminFigure5.18whichsolvesthewellknownknapsackproblem(KP).Thiskernelnaturallymodelsseveraloptimizationproblems[MT90].Intuitively: Misthenumberofobjects,Cisthe\knapsack"capacity,W[k](resp.P[k])istheweight... thecapacity.instancesofsaredenotedbyhs;k;w[k]i,:::,hs;k;ci,for1km. (resp.prot)ofobjectnumberk;theproblemistomaximizetheprotwithoutexceeding Sfor(k=1;k<=M;k++) inta[c+1],w[m+1],p[m+1]; for(j=w[k];j<=c;j++)...figure5.18.knapsackprogram... A[j]=max(A[j],P[k]+A[j-W[k]]);
190 5.3.STORAGEMAPPINGOPTIMIZATION Wesuppose(fromadditionalstaticanalyses)thatW[k]isalwayspositiveandlessthan 189 orequaltoanintegerk.theresultforreferencesa[j]anda[j-w[k]]inright-hand sideofsareconditionals: (hs;k;j;a[j-w[k]]i)=fhs;k0;j0i:1k0k^max(0;j K)<j0<j 1g (hs;k;j;a[j]i)= ifk=1 thenf?g FirstnoticethatprogramKPdoesnothaveanyparallelloops,andthatmemorybaseddependenceshampersdirectparallelization.Therefore,parallelizingKPrequires theapplicationofpreliminaryprogramtransformations. vertedtosaform.theuniquefunctionimplementsarun-timechoicebetweenvalues producedbyfhs;k0;j0i:1k0k^max(0;j K)<j0<j 1g,forsomereadaccess hs;k;j;a[j-w[k]]i. Thankstothereachingdenitioninformation,Figure5.19showsprogramKPcon- elsefhs;k 1;jig... intas[m+1,c+1] inta[c+1],w[m+1],p[m+1] Sfor(k=1;k<=M;k++) for(j=w[k];j<=c;j++) AS[k,j]=if(k==1) else max(a[j],p[1]+a[j-w[1]]); max(as[k-1,j],...figure5.19.kpinsingle-assignmentform... P[k]+(fhS;k0;j0i:1k0k^max(0;j K)<j0<j 1g); ofa[j-w[k]]hasbeen\moved"bysaformtransformation\to"as[k,j-w[k]].then (fhs;k0;j0i:1k0k^max(0;j K)<j0<j 1g)isequaltoAS[k,j-W[k]]. Thisoptimizationavoidstheuseoftemporaryarrays.Itcanbeperformedautomatically, Eventually,inthisparticularcase,thefunctionisreallyeasytocompute:thevalue alongwithotherinterestingoptimizations,seesection alegalparallelscheduleforprogramkpis:\executeinstancehs;k;jiatstepk+j",see Figure5.20(seeSection2.5.2forschedulecomputation). aretruedependencesbetweenareachingdenitioninstanceanditsuseinstances.thus ThegoodthingwithSA-transformedprogramsisthattheonlyremainingdependences putationsaectingthesamepartofadatastructure(seesection2.5.2).rectangular techniquesimprovedatalocalityandreducecommunicationsingroupingtogethercom- tosingle-assignmentkp,basedoninstancewisereachingdenitioninformation.tiling SinceKPisaperfectlynestedloop,itisalsopossibletoapplytilingtechniques AndonovandRajopadhye[AR94],seealso[BBA98]foradditionalinformationontiling totheoreticalmodels[it88,cfh95,bdrr94]orprolingtechniques.theknapsack problemhasbeenmuchstudiedandveryecientparallelizationshavebeencraftedby mctilesseemappropriateinourcase;theheightmandwidthccanbetunedthanks
191 j CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION j j...figure5.20.instancewisereachingdenitions,schedule,andtilingforkp... k k k theknapsackalgorithm.thethirdgraphinfigure5.20represents22tiles,butlarger sizesareusedinpractice,seesection executionstopsbeingusefulafteragivendelay:if1k;k0mand1j;j0care orthetilingproposedinfigure5.20,wecanprovethatsomevalueproducedduringthe beusedbyhs;k;j+1i;:::;hs;k;min(c;j+k)iorbyhs;k+1;ji.usingtheschedule ConsiderthedependencesinFigure5.20.ThevalueproducedbyinstancehS;k;jimay suchthatk+j+k<k0+j0,thevalueproducedbyhs;k;jiisnotusedbyhs;k0;j0i. Thisallowsacyclicfoldingofthestoragemapping:everyaccessoftheformAS[k,j] canbesafelyreplacedbyas[k%(k+1),j].theresultisshowninfigure for(k=1;k<=m;k++) intas[k+2,c+1] inta[c+1],w[m+1],p[m+1] S for(j=w[k];j<=c;j++) AS[k%(K+1),j]=if(k==1) else max(a[j],p[1]+a[j-w[1]]); max(as[(k-1)%(k+1),j],...figure5.21.partialexpansionforkp... P[k]+(fhS;k0;j0i:1k0k^max(0;j K)<j0<j 1g); sion,andwithcintheoriginalprogram(wherenolegalparallelreorderingwaspossible). Thissuggeststwoobservations: StoragerequirementforarrayASis(K+1)C,tobecomparedwithMCintheSAver- rst,thegainisonlysignicantwhenkismuchsmallerthanm,whichmaynot second,theexpandedsubscriptinleft-handsideisnotaneanymore,sincekis asymbolicconstant. bethecaseinpractice; Ingeneral,whenthecyclicfoldingisbasedonasymbolicconstant(likeK),itbecomes bothdiculttomeasuretheeectivenessoftheoptimizationandtoreusethegenerated
192 5.3.STORAGEMAPPINGOPTIMIZATION codeinsubsequentanalyses.in[lef98],lefebvreproposedtoforbidsuchsymbolicfoldings,butwebelievetheycanstillbeusefulwhensomecompile-timeinformationonthe Eventually,thispartialexpansionisnotschedule-independent,becauseithighlyde- 191 symbolicbounds(likek)isavailable. pendsonthe\parallelfront"directionassociatedwiththeproposedscheduleandtiling. analysishasalreadybeenperformed yieldingrelation andthataparallelexecution Givenanoriginalprogram(<seq;fe),wesupposethataninstancewisereachingdenition ProblemStatementandFormalSolution problemisheretocomputeanewstoragemappingfexp order<parhasbeencomputedusingsomesuitabletechnique(seechapter2.5.2).our allowingparallelexecutiontopreservetheprogramsemantics.inadditiontotheconict theoriginalsemanticsof(<seq;fe). Givenaparallelexecutionorder<par,wehavetocharacterizecorrectexpansions e suchthat(<par;fexp e)preserves Section2.4.1,webuildaconservativeapproximation6ofthisrelation: relatione,weusetheno-conictrelation6e,whichisthecomplementofe.asin arenotcomplementaryingeneral.indeed,eand6earecomplementaryforthesame Sincebothapproximationsand6areconservative,wehavetobeverycarefulthatthey 8e2E;8v;w2Ae: fe(v)6=fe(w)=)v6w: executione2e,butisdenedasa\mayconict"approximationforallexecutions, and6isthenegationofthe\mustconict"approximation. execution,thesourceofeveryaccessisthesameinthesequentialandintheparallel eofparallelizedprogram(<par;fexp order<par.weintroduce0e:theexactreachingdenitionfunctionforagivenexecution Ourrsttaskistoformalizethememoryreuseconstraintsenforcedbythepartial program: e).12theexpansioniscorrecti,foreveryprogram Wearelookingforacorrectnesscriteriontellingwhethertwowritesmayusethesame memorylocationornot.todothis,wereturntothedenitionof0e: 8e2E;8u2Re;8v2We:v=e(u)=)v=0e(u): (5.14) v<paru^fexp 8e2E:v=0e(u)() Plugging(5.15)in(5.14),weget e(u)=fexp e(v)^ 8w2We:u<parw_w<parv_fexp e(v)6=fexp e(w): 8e2E;8u2Re;8v;w2We:v=e(u)^uparw^wparv=) (5.15) impliedbyv=e(u) through(5.14) anddonotbringanyinformationbetweenfexp Wemaysimplifythisresultsincev<paruandfexp v<paru^fexp e(u)=fexp e(v)constraintsarealready e(v)^fexp e(v)6=fexp andfexp e(w): 8e2E;8u2Re;8v;w2We: e(w): 12Thefactthat<parisnotatotalordermakesnodierenceforreachingdenitions. v=e(u)^uparw^wparv=)fexp e(v)6=fexp e(w):(5.16)
193 Itmeansthatwecannotreusememory(i.e.wemustexpand)whenbothv=e(u)and 192 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION vparw^uparwaretrue.startingfromthisdynamiccorrectnesscondition,wewould bevalidforallexecutions;inotherterms,itshouldbestrongerthancondition(5.16). liketodeduceacorrectnesscriterionbasedonstaticknowledgeonly.thiscriterionmust vanduorwassignsadierentmemorylocationfromv(v6w)intheoriginalprogram; executesbetweenvanduintheparallelprogram,andeitherwdoesnotexecutebetween nitionvofareaduandanotherwritewtoassigndierentmemorylocationswhen:w Wecannowexposetheexpansioncorrectnesscriterion.Itrequiresthereachingde- Theorem5.2(correctnessofstoragemappings)Ifthefollowingconditionholds, seefigure5.22.hereisthepreciseformulationofthecorrectnesscriterion: 8e2E;8v;w2W: semantics. thentheexpansioniscorrect i.e.allowsparallelexecutiontopreservetheprogram 9u2R:vu^wparv^uparw^(u<seqw_w<seqv_v6w) Proof:Werstrewritethedenitionofvbeingthereachingdenitionofu: =)fexp e(v)6=fexp e(w):(5.17) 8e2E;8u2Re;8v2We: Asaconsequence, v=e(u)=)v<sequ^fe(u)=fe(v)^ 8w2We:u<seqw_w<seqv_fe(v)6=fe(w): 8e2E;8u2Re;8v2We: Theright-handsideof(5.18)canbeinsertedinto(5.16)asanadditionalconstraint: (5.16)isequivalentto v=e(u)=) 8w2We:u<seqw_w<seqv_fe(v)6=fe(w):(5.18) 8e2E;8u2Re;8v;w2We: v=e(u)^wparv^uparw^ u<seqw_w<seqv_fe(v)6=fe(w) Letusnowreplaceewithitsapproximationin(5.19) usingv=e(u))vu: =)fexp e(v)6=fexp e(w):(5.19) vu^ u<seqw_w<seqv_fe(v)6=fe(w)^wparv^uparw 8e2E;8u2Re;8v;w2We: 8e2E;8u2Re;8v;w2We: =)fexp e(v)6=fexp wapproximation:v=e(u))vu e(w) v=e(u)^ u<seqw_w<seqv_fe(v)6=fe(w)^wparv^uparw =)fexp e(v)6=fexp e(w)
194 5.3.STORAGEMAPPINGOPTIMIZATION Eventually,weapproximatefeoverallexecutionsthankstorelation6 usingfe(v)6= 193 fe(u))v6u: 9u2R:vu^wparv^uparw^(u<seqw_w<seqv_v6w) =)fexp 8v;w2W: 8e2E;8u2Re;8v;w2We: e(v)6=fexp wapproximation:fe(v)6=fe(u))v6u e(w) Thisprovesthat(5.17)isstrongerthan(5.19),itselfequivalentto(5.16). vu^ u<seqw_w<seqv_fe(v)6=fe(w)^wparv^uparw =)fexp e(v)6=fexp e(w) informationonthestoragemappingmaybeavailable,andwedonotwanttoloose Noticewereturnedtothedenitionofeatthebeginningoftheproof.Indeed,some thefollowingcorrectnesscriterion: it13:theright-handsideof(5.18)gathersinformationonwwhichwouldhavebeenlostin approximatingebyin(5.16).withoutthisinformationonw,wewouldhavecomputed 8e2E;8v;w2W: 9u2R:v=(u)^uparw^wparv=)fexp instanceshs;i;wiandhs;i;w0iwouldsatisfytheleft-handsideof(5.20)aslongasw6= Sadly,thischoiceisnotsatisfyinghere.14Indeed,considerthemotivatingexample:two e(v)6=fexp e(w):(5.20) w0.therefore,theyshouldassigndierentmemorylocationsinanycorrectexpanded it\forgets"thatwisnotexecutedafterthereachingdenitione(u).indeed,wparv Section5.3.1thatamorememory-economicalsolutionwasavailable:seeFigure5.17.c. program.thisleadstothesingle-assignmentversionoftheprogram...butweshowedin inleft-handsideof(5.20)ismuchstronger:itstatesthatwisnotexecutedafterany possiblereachingdenitionsofu,whichincludesmanyinstancesexecutionbeforethe Apreciselookto(5.16)explainswhyreplacingewithin5.16)istooconservative: reachingdenitione(u). (5.17): theinterferencerelation./isdenedasthesymmetricclosureoftheleft-handsideof Inthefollowing,weintroduceanewnotationfortheexpansioncorrectnesscriterion: 8v;w2W:v./wdef _ 9u2R:wu^vparw^uparv^(u<seqv_v<seqw_w6v):(5.21) 9u2R:vu^wparv^uparw^(u<seqw_w<seqv_v6w) () Wetakethesymmetricclosurebecausevandwplaysymmetricrolesin(5.17).Using relation. atoollikeomega[pug92],itismucheasiertohandlesetandrelationoperationsthan nestsandexactreachingdenitionrelations. 14ThiscriterionwasenoughforLefebvreandFeautrierin[LF98]sincetheyonlyconsideredaneloop 13Suchinformationmaybemoreprecisethanderivingitfromtheapproximatereachingdenition
195 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION Sequential v2(u) u w<seqv Parallel v2(u) v6w <seq u u<seqw uparw wparv <par...figure5.22.casesoffexp e(v)6=fexp operations:15 logicformulaswithquantiers.wethusrewritethepreviousdenitionusingalgebraic e(w)in(5.17)..../ = [ ((R)W)\par\(<seq[6)[ par\((par\<seq)):(5.22) ((R)W)\par\(>seq[6)[ par\((par\<seq)) whenv./w onemaysaythat\vinterfereswithw": Rewriting(5.17)withthisnewsyntax,vandwmustassigndistinctmemorylocations Analgorithmtocomputefexp thatwecomputeanexactstoragemappingfexp 8e2E;8v;w2W:v./w=)fexp e fromtheorem5.2ispresentedinsection5.3.4.notice e(v)6=fexp e whichdependsontheexecution. e(w): (5.23) of./,seefigure5.23. Westartwiththreeexamplesshowingtheusefulnessofeachconstraintinthedenition Wenowpresentthefollowingoptimalityresult:16 OptimalityoftheExpansionCorrectnessCriterion Proposition5.2Let<parbeaparallelexecutionorder.Considertwowritesvandw programsemantics,accordingtoapproximationsand6. denedbytheorem5.2.then,executingprogram(<par;fexp suchthatv./w(denedin(5.22)page194),andastoragemappingfexp e(v)=fexp e(w) thatis,fexp e doesnotsatisfytheexpansioncorrectnesscriterion e)violatestheoriginal e suchthat 15Eachlineof(5.21)isrewrittenindependently,thenpredicatesdependingonuareseparatedfromthe regardingexecutionofwrelativelytouandv,seefigure5.22. issatisedforareadu,andtwowritesvandw.onemaydistinguishthreecases Proof:Supposevu^wparv^uparw^(u<seqw_w<seqv_v6w) ofeachline. others.theexistentialquanticationonuiscapturedbycompositionwith.becausevisthepossible reachingdenitionofsomereadaccess,intersectionwith((r)w)isnecessaryintherstdisjunct 16SeeSection2.4.4forageneralremarkaboutoptimality.
196 5.3.STORAGEMAPPINGOPTIMIZATION Tx=; Sx=; R=x; SkT<seqRislegalbutrequiresrenaming:thisis enforcedbyt<seqs,i.e.w<seqv(andtpars,i.e. wparv,andrpart,i.e.uparw). Figure5.23.a.Constraintsw<seqvandwparv,uparw Sx=; R=x; Tx=; S<seqT<seqRislegalbutrequiresrenaming:thisis enforcedbyr<seqt,i.e.u<seqw. Figure5.23.b.Constraintswparv,uparwandu<seqw SA[1]=; TA[foo]=; R=A[1]; SkT<seqRislegalbutrequiresrenaming:thisisenforcedbyS6T,i.e.v6w,sinceSmayassignadierent memorylocationast. Figure5.23.c.Constraintswparv,uparwandv6w Figure5.23.Motivatingexamplesforeachconstraintinthedenitionoftheinterference relation... Thersttwocasesare(1)uexecutesbeforewinthesequentialprogram,i.e.u<seqw, or(2)wexecutesbeforevinthesequentialprogram,i.e.w<seqv:thenwmustassign adierentmemorylocationthanv,otherwisethevalueproducedbyvwouldnever reachuasinthesequentialprogram. Whenwexecutesneitherbeforevnorafteruinthesequentialprogram,onemay keepvandwassigningthesamememorylocationifitwasthecaseinthesequential program.however,ifitmightnotbethecase,i.e.ifv6w,thenwmustassigna dierentmemorylocationthanv,otherwisethevalueproducedbyvwouldnever reachuasinthesequentialprogram Algorithm Theformalismpresentedintheprevioussectionisgeneralenoughtohandleanyimperativeprogram.However,asacompromisebetweenexpressivityandcomputability,and becauseourpreferedreachingdenitionanalysisisfada[bcf97],wechooseanerelationsasanabstraction.toolslikeomega[pug92]andpip[fea91]canthusbeusedfor symboliccomputations,butourprogrammodelisnowrestrictedtoloopnestsoperating onarrays,withunrestrictedconditionals,loopboundsandarraysubscripts. Findingtheminimalamountofmemorytostorethevaluesproducedbytheprogram isagraphcoloringproblemwhereverticesareinstancesofwritesandedgesrepresent interferencesbetweeninstances:thereisanedgebetweenvandwitheycan'tsharethe samememorylocation,i.e.whenv./w.sinceclassiccoloringalgorithmsonlyapplyto nitegraphs,feautrierandlefebvredesignedanewalgorithm[lf98],whichweextend togeneralloop-nests. Themoregeneralapplicationofourtechniquestartswithinstancewisereachingdenitionanalysis,thenapplyaparallelizationalgorithmusingasdependencegraph thus
197 avoidingconstraintsduetospuriousmemory-baseddependences,describetheresultasa 196 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION PartialExpansionAlgorithm partialorder<par,andeventuallyapplythefollowingpartialexpansionalgorithm. oftheclassicalsingle-assignementalgorithmsforloopnests,seesection5.1.inputis thesequentialprogram,theresultsand6ofaninstancewiseanalysis,andparallel Storage-Mapping-OptimizationandSMO-Convert-Quastaresimpleextensions executionorder<par(notusedforsimplesaformconversion).thebigdierencewith numberofexpandedarrays,partialrenamingiscalledattheendoftheprocesstocoalesce ofmemoryreferences,seebuild-expansion-vectorinsection5.3.5.toreducethe itspurposeistoreducememoryusageofeachexpandedarrayaswitha\cyclicfolding" SA-formisthecomputationofanexpansionvectorESofintegersorsymbolicconstants: Section Storage-Mapping-Optimization(program;;6;<par) datastructuresusingaclassicalgraphcoloringheuristic,seepartial-renamingin program:anintermediaterepresentationoftheprogram :thereachingdenitionrelation,seenasafunction 1./ returnsanintermediaterepresentationoftheexpandedprogram 6:theno-conictrelation 2 <par:theparallelexecutionorder 3foreacharrayAinprogram [ ((R)W)\par\(<seq[6)[ par\((par\<seq)) ((R)W)\par\(>seq[6)[ par\((par\<seq)) 5674doforeachstatementSassigningAinprogram does declareanarrayas left-handsideofs Build-Expansion-Vector(S;./) do=ref foreachreferencereftoainprogram quast \(Iref) Make-Quast(=ref) AS[Iter(CurIns)%ES] 14returnprogram 12 13program 11 ref map Partial-Renaming(program;./) map(curins) SMO-Convert-Quast(quast;ref) parallelexecutionorder<par:weareassuredthattheoriginalprogramsemanticwillbe preservedintheparallelversion. Twotechnicalissueshavebeenpointedout.HowistheexpansionvectorESbuilt Thisalgorithmoutputsanexpandedprogramwhosedatalayoutiswellsuitedfor tion foreachstatements?howispartialrenamingperformed?thisisthepurposeofsec- BuildinganExpansionVector ForeachstatementS,theexpansionvectormustensurethattwoinstancesvandw ArrayReshapingandRenaming assigndierentmemorylocationswhenv./w.moreover,itshouldintroducememory
198 5.3.STORAGEMAPPINGOPTIMIZATION SMO-Convert-Quast(quast;ref) quast:thequastrepresentationofthereachingdenitionfunction 197 ref:theoriginalreference,usedwhen?isencoutered 31switch 2returnstheimplementationofquastasavalueretrievalcodeforreferenceref casequast=f?g: 654 casequast=f{g: A returnref 78 S x returnas[x%es] Iter({) Stmt({) Array({) casequast=f{1;{2;:::g: casequast=ifpredicatethenquast1elsequast2: return(f{1;{2;:::g) returnifpredicatesmo-convert-quast(quast1;ref) reusebetweeninstancesofsasoftenaspossible. elsesmo-convert-quast(quast2;ref) twoinstancesvandwintheexpandedcodeifv2w,w2wandfexp dencesbetweensomeinstancesofthisstatement(thereisanoutputdependencebetween BuildinganexpandedprogramwithmemoryreuseonSintroducesoutputdepen- relatedby./.suchanoutputdependenceiscalledaneutraloutputdependence[lf98]. Anoutputdependencebetweenvandwisvalidintheexpandedprogramitheleft-hand sideoftheexpansioncorrectnesscriterionisfalseforvandw,i.e.ivandwarenot e(v)=fexp e(w)). shapetoonlyauthorizeneutraloutputdependencesons. TheaimistoelaborateanexpansionvectorwhichgivestoASanoptimizedbutsucient dimensionofasmusthaveasucientsizetoforbidanynon-neutraloutputdependence. considered),withp2f0;:::;ns 1gandgivesthesizeofdimension(p+1)ofAS.Each EachelementES[p+1]istheexpansiondegreeofSatdepthp(thedepthoftheloop ThedimensionofESisequaltothenumberofloopssurroundingS,writtenNS. canbededucedfromtheexpansioncorrectnesscriterion(5.17),callitwsp(v).itholds Foragivenaccessv,thesetofinstanceswhichmaynotwriteinthesamelocationasv allinstanceswsuchthat: Iter(v)[1::p]=Iter(w)[1::p]andIter(v)[p+1]<Iter(w)[p+1]; wisaninstanceofs:stmt(w)=s; LetwSp(v)bethelexicographicmaximumofWSp(v).ForallwinWSp(v),wehavethe followingrelations: Andv./w. Iter(v)[1::p]=Iter(w)[1::p]=Iter(wSp(v))[1::p] IfES[p+1]isequalto(Iter(wSp(v))[p+1] Iter(v)[p+1]+1)andknowingthat theindexfunctionwillbeas[iter(v)%es],weensurethatnonon-neutraloutput Iter(v)[p+1]<Iter(w)[p+1]Iter(wSp(v))[p+1] dependenceappearbetweenvandanyinstanceofwsp(v).butthispropertymustbe
199 veriedforeachinstanceofs,andesshouldbesettothemaximumof(iter(wsp(v))[p+ 198 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION 1] Iter(v)[p+1]+1)forallinstancesvofS.Thisprovesthatthefollowingdenition ofesforbidsanyoutputdependencebetweeninstancesofsinrelationwith./: ComputingthisforeachdimensionofESensuresthatAShasasucientsizeforthe ES[p+1]=maxIter(wSp(v))[p+1] Iter(v)[p+1]+1:v2W^Stmt(v)=S expansiontopreservethesequentialprogramsemantics.thisisthepurposeofbuild- (5.24) thenumberofinteger-valuedcomponentsines,i.e.thenumberof\projected"dimensions, thebuild-expansion-vectoralgorithm,thesimplestoptimalityconceptisdenedby Foradetailedproof,anintuitiveintroductionandrelatedworks,see[LF98,Lef98].For Expansion-Vector:workingisrelation(v;WSp(v))andmaxvisrelation(v;wSp(v)). asproposedbyquillereandrajopadhyein[qr99].butevenwiththissimpledenition, optimalityisstillanopenproblem.sincethealgorithmproposedby[qr99]hasbeen Build-Expansion-Vector(S;./) isleftforfuturework. provenoptimal,weshouldtrytocombinebothtechniquestoyieldbetterresults,buthis 1NS S:thecurrentstatement./:theinterferencerelation returnsexpansionvectores(avectorofintegersorsymbolicconstants) 42forp=1toNS 3doworking numberofloopssurroundings f(v;w):hs;vi2w^hs;wi2w 756 maxv f(v;max<lexfw:(v;w)2workingg)g ^v[1::p]=w[1::p]^v[1::p+1]<w[1::p+1] 8returnvector vector[p+1] ^hs;vi./hs;wig Now,acomponentofEScomputedbyBuild-Expansion-Vectorcanbeasymbolic max<lexfw v[p+1]+1:(v;w)2maxvg cannotbeshownwiththeavailablecompile-timeinformation,thecomponentissetto +1,meaningthatnomodulocomputationshouldappearinthegeneratedcode(forthis sionofiterationspaceofs,itisusefulforreducingmemoryusage;butifsucharesult constant.whenthisconstantcanbeproven\muchsmaller"thantheassociateddimen- particulardimension).theinterpretationof\muchsmaller"dependsontheapplication: Lefebvreconsideredin[Lef98]thatonlyintegerconstantswhereallowedinES,butwe believethatthisrequirementistoostrong,asshownintheknapsackexample(amodulo K+1isneeded). NoweveryarrayAShasbeenbuilt,onecanperformanadditionalstoragereductionto PartialRenaming thegeneratedcode.indeed,fortwostatementssandt,partialexpansionbuildstwo structuresasandatwhichcanhavedierentshapes.ifattheendoftherenaming processsandtareauthorizedtosharethesamearray,thisonewouldhavetobethe rectangularhullofasandat:ast.itisclearthatthesetwostatementscansharethe samedataithissharingisnotcontradictorywiththeexpansioncorrectnesscriterion
200 5.3.STORAGEMAPPINGOPTIMIZATION forinstancesofsandt.onemustverifyforeveryinstanceuofsandvoft,thatthe 199 valueproducedbyu(resp.v)cannotbekilledbyv(resp.u)beforeitstopsbeinguseful. betweentwoverticessandtiithasbeenshownthattheycannotsharethesame Inthisgraph,eachvertexrepresentsastatementoftheprogram.Thereisanedge graphsimilartoaninterferencegraphasusedintheclassicregisterallocationprocess. FindingtheminimalrenamingisNP-complete.Ourmethodconsistsinbuildinga datastructureintheirleft-handsides.thenoneappliesonthisgraphagreedycoloring algorithm.finallyitisclearthatverticesthathavethesamecolorcanhavethesame Partial-Renaming(program;./) Greedy-Coloringalgorithmreturnsafunctionmappingeachstatementtoacolor). datastructure.thispartialrenamingalgorithmissketchedinpartial-renaming(the program:theprogramwherepartialrenamingisrequired 2dointerfere 1foreacharrayAinprogram./:theinterferencerelation returnstheprogramwithcoalesceddatastructures 543 doif9hs;vi;ht;wi2w:hs;vi./ht;wi foreachpairofstatementssandtassigningainprogram theninterfere? 687 coloring doleft-handsidea[subscript]ofs foreachstatementssassigningainprogram Greedy-Coloring(interfere) interfere[f(s;t)g 9returnprogram Acoloring(S)[subscript] ThereasonisthatsubscriptsofexpandedarraysareoftheformAS[subscript%ES], Thepartialexpansionalgorithmoftenyieldspoorresults,especiallyontiledprograms DealingwithTiledParallelPrograms ure5.24showsanexampleofwhatwewouldliketoachieveonsomeblock-regularexpan- sions.nocyclicfoldingwouldbepossibleonsuchanexample,sincethetwoouterloops areparallel. andtheblockregularityoftiledprogramsdoesnotreallytinthiscyclicpattern.fig- onthetileshapeisavailable.thetechniqueconsistsindividingeachdimensionwiththe cyclicpatternsisstillanopenproblem,becauseitrequiresnon-aneconstraintstobe optimized.weonlyproposeawork-around,whichworkswhensomeaprioriknowledge Thedesignofanimprovedgraphcoloringalgorithmabletoconsiderbothblockand ExpandedarraysubscriptsarethusoftheformAS[i1=shape1,,iN=shapeN],where (i1;:::;in)istheiterationvectorassociatedwithcurins(denedinsection5.1),and associatedtilesize.sometimes,theresultingstoragemappingwillbecompatiblewith whereshapeiiseither1orthesizeoftheithdimensionofthetile. therequiredparallelexecution,andsometimesnot:decisionismadewiththeorem5.2. buttheexpansionschemeissomewhatdierent:seesection Itispossibletoimprovethistechniqueincombiningdivisionsandmodulooperations,
201 intx; CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION SR for(i=0;i<n;i++) for(j=0;j<n;j++){ x=; intxs[n,n]; Figure5.24.a.Originalprogram } =x; SR for(i=0;i<n;i++) for(j=0;j<n;j++){ xs[i,j]=; Figure5.24.b.Single-assignmentprogram } =xs[i,j]; intxs[n/16,n/16]; parallelfor(i=0;i<n;i+=16) S parallelfor(j=0;j<n;j+=16) R for(ii=0;ii<16;ii++) for(jj=0;jj<16;jj++){ xs[i/16,j/16]=; Figure5.24.c.Partiallyexpandedtiledprogram } =xs[i/16,j/16]; Figure5.24.Anexampleofblock-regularstoragemapping... parallelizationtechnique,suchasschedulingortiling.itiswellsuitedtoparallelizing ThetechniquepresentedinSection5.3.4yieldsthebestresults,butinvolvesanexternal Schedule-IndependentStorageMappings compilers. form,atamuchlowercostinmemoryusage. noparallelexecutionschemeisenforced.theaimistopreservethe\portability"ofsa Aschedule-independent(a.k.a.universal)storagemapping[SCFS98]isusefulwhenever and./2,wehave: parallelexecutionorders<1parand<2parwhoseassociatedinterferencerelationsare./1 Fromthedenitionof./ theinterferencerelation in(5.21),andconsideringtwo Now,aschedule-independentstoragemappingfexp <1par<2par=)./2./1: order.bydenitionofcorrectexecutionorders Theorem2.2page81 thisconditionis Mapping-Optimizationalgorithmshouldthusbeincludedinanycorrectexecution bleparallelexecution<paroftheprogram.partialorder<parusedinthestorage- e mustbecompatiblewithanypossi- denitionrelation:+. satisedbythedata-owexecutionorder,whichisthetransitiveclosureofthereaching canbehopedforthedata-owexecutionorder+,becausepresburgerarithmeticisnot closedundertransitiveclosure.hence,weneedtocomputeanapproximaterelation. andexperimentalstudyarealsopresentedinsection5.2.5).ingeneral,noexactresult Section3.1.2describesawaytocomputethetransitiveclosureof(usefulremarks Becausetheapproximationmustbeincludedinallpossiblecorrectexecutionorder,we wantittobeasub-orderoftheexactdata-oworder(i.e.theoppositeofaconservative approximation).suchanapproximationcanbecomputedwithomega[pug92].
202 5.3.STORAGEMAPPINGOPTIMIZATION DynamicRestorationoftheData-Flow 201 wehaveseeninsection5.1.3.indeed,algorithmloop-nests-implement-phiapplies Implementingfunctionsforapartiallyexpandedprogramisnotverydierentfromwhat arrays.now,remember-arraysaresupposedtobeinone-to-onemappingwithexpanded withoutmodication.butdoingthis,nostoragemappingoptimizationisperformedon- oftheoriginalprogram,sincethesamedependenceswillbesharedbyexpandedarrays and-arrays. datastructures.single-assignment-arraysarenotnecessarytopreservethesemantics ofalgorithmpartial-renaming. counterpartas[x%es].inasecondstep,onemerge-arraystogetherusingtheresult Phi.TherststepconsistsinreplacingeveryreferencetoAS[x]withits\folded" TheresultingcodegenerationalgorithmisverysimilartoLoop-Nests-Implementreconsidered:valuesproducedbyafewinstancesmaynowbeoverwritten,accordingtothe newstoragemapping.asinthemotivatingexample,thefunctioncanevendisappear, seefigure5.17.agoodtechniquetoautomaticallyachievethisisnottoperformanew Eventually,foragivenfunction,thesetofpossiblereachingdenitionsshouldbe a(set)referenceshouldbereplacedby reachingdenitionanalysis.oneshouldupdatetheavailablesetsofreachingdenitions: Renaming,fexp Moreover,ifcoloringistheresultofthegreedygraphcoloringalgorithminPartial- e(hs;xi)=fexp e(hs0;x0i)isequivalentto e(v)=fexp e(w)g: BacktotheExamples coloring(s)=coloring(s0)^(xmodes=x0modes0): FirstExample UsingtheOmegaCalculatortext-basedinterface,wedescribeastep-by-stepexecution arewritten[i,0,1],[i,j,2]and[i,0,3],respectively. withzeroes.wenumbert,s,rwith1,2,3inthisorder,soht;ii,hs;i;jiandhr;ii oftheexpansionalgorithm.wehavetocodeinstancesasinteger-valuedvectors.an instancehs;iiisdenotedbyvector[i,,s],where[]possiblypadsthevector rithmaccordingtotheparallelexecutionorderproposedinfigure5.17. Schedule-dependentstoragemapping.Werstapplythepartialexpansionalgo- S:={[i,0,2]->[i,0,1]:1<=i<=N} TheresultofinstancewisereachingdenitionanalysisiswritteninOmega'ssyntax: union{[i,w,2]->[i,w-1,2]:1<=i<=n&&1<=w} Theno-conictrelationistrivialhere,sincetheonlydatastructureisascalarvariable: union{[i,0,3]->[i,w,2]:1<=i<=n&&0<=w}; union{[i,0,3]->[i,0,1]:1<=i<=n} NCon:={[i,w,s]->[i',w',s']:1=2};#1=2meansFALSE!
203 202Weconsiderthattheouterloopisparallel.Itgivesthefollowingexecutionorder: CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION Par:={[i,w,2]->[i,w',2]:1<=i<=N&&0<=w<w'}union {[i,0,1]->[i,w',2]:1<=i<=n&&0<=w'}union {[i,0,1]->[i,0,3]:1<=i<=n}union callitint. Wehavetocomputerelation./inleft-handsideoftheexpansioncorrectnesscriterion, {[i,w,2]->[i,0,3]:1<=i<=n&&0<=w}; Full:={[i,w,s]->[i',w',s']:1<=s<=3&&(s=2 w=w'=0) #The"full"relation #Thesequentialexecutionorder Lex:={[i,w,2]->[i',w',2]:1<=i<=i'<=N&&0<=w,w'&&(i<i' w<w')} &&1<=i,i'<=N&&0<=w,w'}; union{[i,w,2]->[i',0,1]:1<=i,i'<=n&&0<=w&&i<i'} union{[i,0,3]->[i',0,3]:1<=i<i'<=n} union{[i,0,1]->[i',w',2]:1<=i<=i'<=n&&0<=w'} union{[i,0,1]->[i',0,1]:1<=i<i'<=n} union{[i,0,3]->[i',0,1]:1<=i<i'<=n} union{[i,w,2]->[i',0,3]:1<=i<=i'<=n&&0<=w} union{[i,0,1]->[i',0,3]:1<=i<=i'<=n} NPar:=Full-Par; ILex:=inverseLex; union{[i,0,3]->[i',w',2]:1<=i<i'<=n&&0<=w'}; INpar:=inverseNPar; Int:=Intunion(inverseInt); Int:=(INParintersection(ILexunionNCon)) Theresultis: union(inparintersections(nparintersectionlex)); Int; {[i,w,2]->[i',w',2]:1<=i'<i<=n&&1<=w<=w'}union {[i,0,2]->[i',w',2]:1<=i'<i<=n&&0<=w'}union {[i,w,2]->[i',w-1,2]:1<=i'<i<=n&&1<=w}union {[i,w,2]->[i',w',2]:1<=i'<i<=n&&0<=w'<=w-2}union {[i,0,1]->[i',0,1]:1<=i'<i<=n}union {[i,0,2]->[i',0,1]:1<=i'<i<=n}union {[i,0,1]->[i',w',2]:1<=i'<i<=n&&0<=w'}union {[i,0,3]->[i',0,1]:1<=i'<i<=n}union {[i,0,3]->[i',w',2]:1<=i'<i<=n&&0<=w'}union {[i,w,2]->[i',0,3]:1<=i<i'<=n&&0<=w}union {[i,0,1]->[i',0,3]:1<=i<i'<=n}union {[i,w,2]->[i',0,1]:1<=i<i'<=n&&0<=w}union {[i,0,1]->[i',0,2]:1<=i<i'<=n}union
204 5.3.STORAGEMAPPINGOPTIMIZATION {[i,0,1]->[i',0,1]:1<=i<i'<=n}union 203 {[i,w,2]->[i',w',2]:1<=i<i'<=n&&0<=w<=w'-2}union {[i,w,2]->[i',0,2]:1<=i<i'<=n&&0<=w}union {[i,w,2]->[i',w+1,2]:1<=i<i'<=n&&0<=w}union {[i,w,2]->[i',w',2]:1<=i<i'<=n&&1<=w'<=w} isempty,meaningthatneitherexpansionnorrenamingmustbedoneinsideaniteration Intintersection{[i,w,s]->[i,w',s']} Aquickvericationshowsthat program. ThenES[1]shouldbesettoN.Wehaveautomaticallyfoundthepartiallyexpanded WS0(v)(i.e.fortheouterloop)yieldsallaccesseswexecutingafterv(forthesamei). oftheouterloop.inparticular:es[2]shouldbesetto0.however,computingtheset accordingtothe"data-ow"executionorder.theparallelexecutionorderisdenedas follows: Schedule-independentstoragemapping.Wenowapplytheexpansionalgorithm Onceagain Intintersection{[i,w,s]->[i,w',s']} Par:=S+; parallelization-dependent,one. isempty.theschedule-independentstoragemappingisthusthesameastheprevious, Figure5.17. SecondExample Theresultingprogramforbothtechniquesisthesameasthehand-craftedonein WenowconsidertheknapsackprograminFigure5.18.Itiseasytoshowthatascheduleindependentstoragemappingwouldgivenobetterresultthatsingle-assignmentform. mappingwithsubscriptsoftheformas[curins%es] wouldbemoreeconomicalthan Moreprecisely,itisimpossibletondanyschedulesuchthata\cyclicfolding" astorage single-assignmentform. classicaltechniquessincetheloopisperfectlynested.section5.3.10hasshowngood performancefor1632tiles,butweconsider21tilesforthesakeofsimplicity.the lelizationofprogramkprequirestilingoftheiterationspace.thiscanbedoneusing Wearethuslookingforaschedule-dependentstoragemapping.Anecientparal- parallelexecutionorderconsideredisthesameastheonepresentedinsection5.3.1:tiles arescheduledinfrontsofconstantk+j,andtheinner-tileorderistheoriginalsequential S:={[k,j]->[k-1,j]:2<=k<=M&&1<=j<=C}union executionone. TheresultofinstancewisereachingdenitionanalysisiswritteninOmega'ssyntax: relation: Instanceswhichmaynotassignthesamememorylocationaredenedbythefollowing {[k,j]->[k,j']:1<=k<=m&&1<=j'<j<=c&&j'-k<=j}; NCon:={[k,j]->[k',j']:1<=k,k'<=M&&1<=j,j'<=C&&j!=j'};
205 204Consideringthe21tiling,itiseasytocompute<par: CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION InnerTile:={[k,j]->[k',j]:(existskq,kr,kr':k=2kq+kr InterTile:={[k,j]->[k',j']:(existskq,kr,kq',kr':k=2kq+kr Par:=Lexintersection(InnerTileunionInterTile); &&k'=2kq+kr'&&0<=kr<kr'<2)}; &&k'=2kq'+kr'&&0<=kr,kr'<2&&kq+j<kq'+j')}; #The"full"relation callitint. Wehavetocomputerelation./inleft-handsideoftheexpansioncorrectnesscriterion, Full:={[k,j]->[k',j']:1<=k,k'<=M&&1<=j,j'<=C}; #Thesequentialexecutionorder NPar:=Full-Par; ILex:=inverseLex; Lex:=Fullintersection{[k,j]->[k',j']:k<k' (k=k'&&j<j')}; INpar:=inverseNPar; Int:=Intunion(inverseInt); Int:=(INParintersection(ILexunionNCon)) Theresultis: union(inparintersections(nparintersectionlex)); Int; {[k,j]->[k',j']:1<=k<=k'<=m&&1<=j<j'<=c}union {[k,j]->[k',j']:1<=k<k'<=m&&1<=j'<j<=c}union {[k,j]->[k',j']:exists(alpha:1,2alpha+2<=k<k'<m {[k,j]->[k',j']:exists(alpha:1,2alpha+2<=k'<k<m {[k,j]->[k',j']:1<=j<j'<=c&&1<=k'<k<=m}union &&j<=c&&1<=j'&&k'+2j'<=2+2j+2alpha)}union {[k,j]->[k',j']:1<=k'<=k<=m&&1<=j'<j<=c} &&j'<=c&&1<=j&&k+2j<=2+2j'+2alpha)}union Intintersection{[k,j]->[k+K+1,j']} Aquickvericationshowsthat isempty,meaningthates[1]shouldbesettok+1. automaticstoragemappingoptimizationhasalsobeenperformedongeneralloop-nests, PartialexpansionhasbeenimplementedforCray-Fortrananeloopnests[LF98].Semi- Experiments usingfada,omega,andpip. ThethreeaneloopnestsexampleshavealreadybeenstudiedbyLefebvrein[LF98, Figure5.25summarizesexpansionandparallelizationresultsforseveralprograms.
206 5.4.CONSTRAINEDSTORAGEMAPPINGOPTIMIZATION... Sequential Parallel ParallelSize Run-timeOverhead205 MVProduct Cholesky ProgramComplexity O(N2) N2+2N+1 Size Complexity 2N2+3N SA Optimized 2N2+N SA Optimized Convolution Knapsack Gaussian O(NM) O(MC) O(N3) N2+N+1 C+2M 1 O(M+C)MC+C+2MKC+2C+2M O(M) O(N) N3+N2+N NM+N 2N2+2N N cheap free no free...figure5.25.timeandspaceoptimization... no Lef98]:matrix-vectorproduct,CholeskyfactorizationandGaussianelimination.Afew experimentshavebeenmadeonansgiorigin2000,usingthemplibrary(butnotpca, thebuilt-inautomaticparallelizer).asonewouldexpect,resultsfortheconvolution resultsformediumarraysizes,17bothintermsofspeed-upandrelativelytotheoriginal theoriginal(notexpanded)one;thesecondoneshowsthespeed-up.wegotverygood infigure5.26.therstgraphcomparesexecutiontimeoftheparallelprogramandof programareexcellentevenforsmallvaluesofn.executiontimesforprogramkpappear knapsackprogram Figure5.26.Performanceresults ConstrainedStorageMappingOptimization Processors Processors expansion.weshowherethatcombiningthetwotechniquesinamoregeneralexpansion Sections5.2and5.3addressedtwotechniquestooptimizeparallelizationviamemory twocomplementarydirections: frameworkispossibleandbringssignicantimprovements.optimizationisachievedfrom Addingconstraintstolimitmemoryexpansion,likestaticexpansionavoidingfunctions[BCC98],privatization[TP93,MAL93],orarraystaticsingleassignment 17HereC=2048,M=1024andK=16,with1632tiles(scheduledsimilarlytoFigure5.18). [KS98].Allthesetechniquesallowpartialremovalofmemory-baseddependences, butmayextractlessparallelismthanconversiontosingleassignmentform. Time (ms) Sequential Parallel Speed-up Optimal Effective
207 206Applyingstoragemappingoptimizationtechniques[CL99].Someoftheseareeither CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION (scheduling,tiling,etc.)ornot. mizations)whethertheyrequireformercomputationofaparallelexecutionorder schedule-independent[scfs98]orschedule-dependent[lf98](yieldingbetteropti- intoauniedframeworkformemoryexpansion.themotivationforsuchaframeworkis thefollowing:becauseoftheincreasedcomplexityofdealingwithirregularcodes,and giventhewiderangeofparameterswhichcanbetunedwhenparallelizingsuchprograms, Wetryheretogetthebestofbothdirectionsandshowthebenetofcombiningthem ofthistrend.webelievethatourconstrainedexpansionframeworkgreatlyreducesthe complexityoftheoptimizationproblem,inreducingthenumberofparametersandhelping orafewoftheseparameters.thetwoprecedingsectionsaresomeofthebestexamples abroadrangeofexpansiontechniqueshavebeenorwillbedesignedforoptimizingone theautomationprocess. weformallydenecorrectconstrainedstoragemappings.then,wepresentanintralelizationtechniquesproceduralalgorithmwhichhandlesanyimperativeprogramandmostloopnestparal- Withthehelpofamotivatingexampleweintroducethegeneralconcepts,before Westudythepseudo-codeinFigure5.27.a.Suchnestedloopswithconditionalsappear inmanykernels,butmostparallelizationtechniquesfailtogenerateecientcodefor Motivation theseprograms.instancesoftaredenotedbyht;i;ji,instancesofsbyhs;i;j;ki,and instancesofrbyhr;ii,for1i;jmand1kn.(\p(i;j)"isabooleanfunction ofiandj.)... doublex; for(i=1;i<=m;i++){ for(j=1;j<=m;j++) doublext[m+1,m+1],xs[m+1,m+1,n+1]; TS if(p(i;j)){ x=0; R =x; } for(k=1;k<=n;k++) } x=x; Tfor(i=1;i<=M;i++){ for(j=1;j<=m;j++) S if(p(i;j)){ xt[i,j]=0; } for(k=1;k<=n;k++) xs[i,j,k]=if(k==1)xt[i,j]; elsexs[i,j,k-1]; Figure5.27.a.Originalprogram Figure5.27.b.Singleassignmentform R} =(fhs;i;1;ni;:::;hs;i;m;nig); onetimeforeachiterationoftheouterloop.apreciseinstancewisereachingdenition...figure5.27.motivatingexample... whenk=1andhs;i;j;k 1iwhenk>1.Weonlygetanapproximateresultfor analysistellsusthatthereachingdenitionofthereadaccesshs;i;j;kitoxisht;i;ji Onthisexample,assumeNispositiveandpredicate\P(i;j)"evaluatestotrueatleast denitionsthatmayreachhr;ii:thosearefhs;i;1;ni;:::;hs;i;m;nig.infact,the
208 5.4.CONSTRAINEDSTORAGEMAPPINGOPTIMIZATION 207 valueofxmayonlycomefroms(sincen>0)forthesamei(sincetexecutesatleast onetimeforeachiterationoftheouterloop),andfork=n. Obviously,memory-baseddependencesonxhampersparallelization.Ourintentisto expandscalarxsoastogetridofasmanydependencesaspossible.figure5.27.bshows ourprogramconvertedtosaform.theuniquefunctionimplementsarun-timechoice betweenvaluesproducedbyhs;i;1;ni;:::;hs;i;m;ni. jatstatementswhenxwasassigned.thisinformationallowsvaluerecoveryinr,see thethirdmethodinsection5.1.4fordetails. Butthisparallelprogramisnotusableonanyarchitecture.Themainreasonis memoryusage:variablexhasbeenreplacedbyahugethree-dimensionalarray,plustwo smallerarrays.thiscodeisapproximatelyvetimesslowerthantheoriginalprogramon asingleprocessor(whenarrayscanbeaccomodatedinmemory).... doublext[m+1,m+1],xs[m+1,m+1,n+1]; parallelfor(j=1;j<=m;j++) if(p(i;j)){ T xt[i,j]=0; for(k=1;k<=n;k++) S xs[i,j,k]=if(k==1) xt[i,j]; } R=xS[i,@x[i],N]; } Figure5.28.a.ParallelSA doublex[m+1,m+1]; int@x[m+1]; parallelfor(j=1;j<=m;j++) if(p(i;j)){ T x[i,j]=0; for(k=1;k<=n;k++) S } R=x[i,@x[i]]; } Figure5.28.b.ParallelSMO...Figure5.28.Parallelizationofthemotivatingexample... Thisshowstheneedforamemoryusageoptimizationtechnique.Storagemapping optimization(smo)[cl99,lf98,scfs98]consistsinreducingmemoryusageasmuch aspossibleassoonasaparallelexecutionorderhasbeencrafted,seesection5.3.a singletwo-dimensionalarraycanbeused,whilekeepingthetwoouterloopsparallel,see Figure5.28.b.Run-timecomputationoffunctionwitharray@xseemsverycheapat rstglance,butexecutionof@x[i]=max(@x[i],j)hidessynchronizationsbehind thecomputationofthemaximum!asusual,itresultsinaverybadscaling:good accelerationsareobtainedforaverysmallnumberofprocessors,thenspeed-updrops dramaticallybecauseofsynchronizations.figure5.29givesexecutiontimeandspeed-up fortheparallelprogram,comparedtotheoriginal notexpanded one.weusedthemp libraryonansgiorigin2000,withm=64andn=2048,andsimpleexpressionsfor \"parts. Thisbadresultshowstheneedforanerparallelizationscheme.Thequestionisto
209 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION Time (ms) Sequential SMO Figure5.29.Performanceresultsforstoragemappingoptimization widely-usedparallelcomputers,theprocessornumberislikelytobelessthan100,but ndagoodtradeobetweenexpansionoverheadandparallelismextraction.ifwetarget Processors Processors SAformextractedtwoparallelloopsinvolvingM2processors!Theintuitionisthatwe wastedmemoryandrun-timeoverhead. andinnermostloops,butexpansionalongjisforbidden itrequiresafunctionthus benetthatnofunctionisnecessaryanymore:xcanbesafelyexpandedalongoutermost (MSE)[BCC98],orprivatization[TP93,MAL93].Choosingstaticexpansionhasthe Onewouldpreferapragmaticexpansionscheme,suchasmaximalstaticexpansion theoriginalprogram hasbeenreplacedbyatwo-dimensionalarray. runstwotimesslowerthantheoriginalone:scalarx probablypromotedtoaregisterin violatesthestaticconstraint,seesection5.2.now,onlytheouterloopisparallel,andwe getmuchbetterscaling,seefigure5.30.however,onasingleprocessortheprogramstill... doublex[m+1,n+1]; Tparallelfor(i=1;i<=M;i++){ for(j=1;j<=m;j++) S if(p(i;j)){ for(k=1;k<=n;k++) x[i,0]=0; 32 R} =x[i,n]; } x[i,k]=x[i,k-1]; Optimal MSE interestregardingparallelismextraction.combiningitwithstoragemappingoptimization...figure5.30.maximalstaticexpansion solvestheproblem,seefigure5.31.scalingisexcellentandparallelizationoverheadis Maximalstaticexpansionexpandedxalongtheinnermostloop,butitwasofno Processors verylow:theparallelprogramruns31:5timesfasterthantheoriginaloneon32processors (form=64andn=2048). Speed-up (parallel / original) Speed-up (parallel / original) 4 2 Optimal SMO
210 5.4.CONSTRAINEDSTORAGEMAPPINGOPTIMIZATION Thisexampleshowstheuseofcombiningconstrainedexpansions suchasprivatizationandstaticexpansion withstoragemappingoptimizationtechniques,toimprove 209 tiveprograms.althoughthisalgorithmcannotitselfchoosethe\best"parallelization,it Inthefollowing,wepresentanalgorithmusefulforautomaticparallelizationofimpera- parallelizationofgeneralloopnests(withunrestrictedconditionalsandarraysubscripts). aimstosimultaneousoptimizationofexpansionandparallelizationconstraints.... parallelfor(i=1;i<=m;i++){ doublex[m+1]; for(j=1;j<=m;j++) ST if(p(i;j)){ for(k=1;k<=n;k++) x[i]=0; R =x[i]; } x[i]=x[i];... Figure5.31.Maximalstaticexpansioncombinedwithstoragemappingoptimization Becauseourframeworkisbasedonmaximalstaticexpansionandstoragemappingoptimization,weinherittheirprogrammodelandmathematicalabstraction:weonlyconsider ProblemStatement IntroducingConstrainedExpansion nestsofloopsoperatingonarraysandabstracttheseprogramswithanerelations. Themotivatingexampleshowsthebenetsofputtinganapriorilimittoexpansion. Staticexpansion[BCC98]isagoodexampleofconstrainedexpansion.Whataboutother previouscompilationstages possiblywithuserinteraction.itiscalledtheconstraint staticexpansion,wesupposethatsomeequivalencerelationonwritesisavailablefrom thatdoesnotexpandvariableswhentheincurredoverheadis\toohigh".togeneralize expansionschemes?thegoalofconstrainedexpansionistodesignpragmatictechniques relation.astoragemappingconstrainedbyisanymappingfexp 8e2E;8v;w2W: vw^fe(v)=fe(w)=)fexp e(v)=fexp e suchthat Itisdiculttodecidewhethertoforbidexpansionofsomevariableornot.Ashort surveyofthisproblemispresentedinsection5.4.5,alongwithadiscussionaboutbuilding e(w): (5.25) ingexpansionandparallelismforperformance.wedonotpresenthereasolutiontothis forsection5.4.8alldiscussionsaboutpickingtherightparallelexecutionorder. constraintrelationfroma\syntactical"or\semantical"constraint.moreover,weleave gratedtoolforparallelization,assoonasthe\strategy"hasbeenchosen whatexpansion complexproblem.thealgorithmdescribedinthenextsectionsshouldbeseenasaninte- Now,thetwoproblemsarepartofthesametwo-criteriaoptimizationproblem:tun- 8 } Processors Speed-up (parallel / original) Optimal MSE + SMO
211 constraints,whatkindofschedule,tiling,etc.mostofthesestrategieshavealreadybeen 210 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION inanautomaticoptimizationprocess.thesummaryofouroptimizationframeworkis shownusefulandpracticalforsomeprograms;ourmaincontributionistheirintegration presentedinfigure Single-assignmentform Expansionconstrainedby Data-owexecutionorder (storagemappingoptimization) (scheduling,tiling,etc.) Correctoptimizedexpansionf0e=(fe;) Correctparallelexecutionorder<par Expansion Originalstoragemappingfe Parallelism Sequentialprogram<seq...Figure5.32.Whatwewanttoachieve... Werstdenecorrectparallelizationsthenstateouroptimizationproblem FormalSolution WhatisaCorrectParallelExecutionOrder? Memoryexpansionpartiallyremovesdependencesduetomemoryreuse.Recallfrom Section2.5thatrelationexpapproximatesthedependencerelationof(<seq;fexp expandedprogramwithsequentialexecutionorder.(expequalswhentheprogramis order<partosatisfythefollowingcondition: convertedtosaform.)thankstotheorem2.2page81,wewantanyparallelexecution e),the sentedinsection Computationofapproximatedependencerelationexpfromstoragemappingfexp 8({1;r1);({2;r2)2A: ({1;r1)exp({2;r2)=){1<par{2: e ispre- (5.26) tiontopreserveoriginalsemantics.ourtaskistoformalizememoryreuseconstraints WhatisaCorrectExpansion? enforcedby<par.usinginterferencerelation./denedinsection5.3.2,wehaveproven intheorem5.2thattheexpansioniscorrectifthefollowingconditionholds. Givenparallelorder<par,wearelookingforcorrectexpansionsallowingparallelexecu- 8e2E;8v;w2W:v./w=)fexp e(v)6=fexp e(w): (5.27)
212 ComputingParallelExecutionOrdersandExpansions 5.4.CONSTRAINEDSTORAGEMAPPINGOPTIMIZATION 211 correctnesscriteria(5.26)and(5.27).letusshowhowsolvingtheseequationssimultaneouslyyieldsasuitableparallelprogram(<par;fextion5.1insection5.2.3 thataconstrainedexpansionismaximal i.e.assignsthelargest FollowingthelinesofSection5.2.3,weareinterestedinremovingasmanydependences e). Weformalizedtheparallelizationcorrectnesswithanexpansionconstraint(5.25)andtwo aspossible,withoutviolatingtheexpansionconstraint.wecanprove likeproposi- numberofmemorylocationswhileverifying(5.25) i alenceclassesof.indeed,iffe(v)=fe(w),conditionfexp StillfollowingSection5.2.3,weassumethatfexp 8e2E;8v;w2We:vw^fe(v)=fe(w)()fexp e =(fe;),whereisconstantonequiv- e(v)=fexp e(v)=fexp e(w)becomes e(w): equivalentto(v)=(w).becauseweneedtoapproximateoverallpossibleexecutions, weuseconictrelation,andourmaximalconstrainedexpansioncriterionbecomes of(instancesthat\may"hitthesamememorylocation),(v)canbedenedviaa Computingisdonebyenumeratingequivalenceclassesof.Foranyaccessvinaclass 8v;w2W;vw:vw()(v)=(w) (5.28) minimumisasimplewaytondrepresentatives,seesection representativeoftheequivalenceclassofvforrelation.computingthelexicographical write.thefullcomputationisdoneinsection5.4.8anduses(5.28);theresultis onviftheyhitthesamememorylocation,vexecutesbeforew,andatleastoneisa Itistimetocomputedependencesexpofprogram(<seq;fexp e):anaccesswdepends 8v2W;w2R:vexpw, 9u2W:uw^vu^vu^v<seqw Werelyonclassicalalgorithmstocompute<parfromexp[Fea92,DV97,IT88,CFH95]. 8v2R;w2W:vexpw, 9u2W:uv^uw^uw^v<seqw Knowing(<par;fexp 8v;w2W:vexpw,vw^vw^v<seqw e),wecouldstopandsaywehavesuccessfullyparallelizedour (5.29) program;butnothingensuresthatfexp themotivatingexample).wemustbuildanewexpansionfrom<parthatminimizes memoryusagewhilesatisfying(5.27). Forconstrainedexpansionpurposes,fexp e isan\economical"storagemapping(remember hassomeconsequencesontheexpansioncorrectnesscriterion:whenfe(v)6=fe(w),itis notnecessarytoset(v)6=(w)toenforcefexp e hasbeenchosenoftheform(fe;).this expansioncorrectnesscriterionthankstoasimplieddenitionofinterferencerelation./. Letbetheinterferencerelationforconstrainedexpansion: v6wclausein(5.22)isnotnecessaryanymore(seepage194),andwemayrewritethe e(v)6=fexp e(w).asaconsequence,the vwdef () 9u2R:vu^wparv^uparw^(u<seqw_w<seqv) Wecanrewritethisdenitionusingalgebraicoperations: _ 9u2R:wu^vparw^uparv^(u<seqv_v<seqw):(5.30) = ((R)W)\par\>seq[ par\((par\<seq)) [ ((R)W)\par\<seq[ par\((par\<seq)):(5.31)
213 212 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION Theorem5.3(correctnessofconstrainedstoragemappings)Ifastoragemappingfexp eisoftheform(fe;)andthefollowingconditionholds,thenfexp eisacorrect expansionoffe i.e.fexp eallowsparallelexecutiontopreservetheprogramsemantics. 8v;w2W;vw:vw=)(v)6=(w): (5.32) ProvingTheorem5.3isastraightforwardrewritingoftheproofofTheorem5.2and theoptimalityresultofproposition5.2alsoholds:theonlydierenceisthatthev6w clausehasbeenreplacedbyvwinleft-handsideof(5.32). Buildingafunctionsatisfying(5.32)isalmostwhatthepartialexpansionalgorithm presentedinsection5.3.5hasbeencraftedfor.insteadofgeneratingcode,onecan redesignthisalgorithmtocomputeanequivalencerelationoverwrites:thecoloring relation.itsonlyrequirementistoassigndierentcolorstointerferingwrites, 8v;w2W:vw=):(vw); (5.33) butwearealsointerestedinminimizingthenumberofcolors.whenvw,itsaysthat itiscorrecttohavefexp e(v)=fexp e(w).thenewgraphcoloringalgorithmispresentedin Section Byconstructionofrelation,afunctiondenedby 8v;w2W;vw: vw()(v)=(w) satisesexpansioncorrectness(5.32),butannoyingly,nothingensuresthatexpansion constraint(5.25)isstillsatised:forallv;w2wsuchasvw,wehavevw)(v)6= (w)butnotnecessarilyvw)(v)6=(w).indeed,denesaminimalexpansion allowingtheparallelexecutionordertopreservetheoriginalsemantics,butitdoesnot enforcethatthisexpansionsatisestheconstraint. Therstproblemistocheckthecompatibilityofand.Thisisensuredbythe followingresult.18 Proposition5.3Forallwritesvandw,itisnotpossiblethatvwandvwatthe sametime.19 Proof:Supposevw,vw,vwandv<seqw.Thethirdlineof(5.29)showsthat vexpw,hencev<parwfrom(5.26).thisprovesthatthevparwconjunctinsecond lineof(5.30)doesnothold.now,sincevw,onemayconsiderareadinstanceu2r suchthattherstlineof(5.30)issatised:vu^wparv^uparw^u<seqw. Exchangingtheroleofuandvinthesecondlineof(5.29)showsthatuexpw,hence u<parwfrom(5.26);thisiscontradictorywithuparw. Likewise,thecasew<seqvyieldsacontradictionwithuparvinthesecondlineof (5.30).Thisterminatestheproof. Wenowhavetodenefromanewequivalencerelation,consideringbothand. Figure5.33showsthat[isnotsucient:considerthreewritesu,vandwsuchthat fe(u)=fe(v)=fe(w),uvandvw.(5.28)enforcesfexp e(u)=fexp e(v)sinceuv. Moreover,tosparememory,weshouldusecoloringrelationandsetfexp e(v)=fexp e(w). Then,noexpansionisdoneandparallelorder<parmaybeviolated. 18Theproofofthisstrongresultisrathertechnicalbuthelpsunderstandingtheroleofeachconjunct inequations(5.29),(5.26)and(5.30). 19Anon-optimaldenitionofrelationwouldnotyieldsuchacompatibilityresult.
214 5.4.CONSTRAINEDSTORAGEMAPPINGOPTIMIZATION wrw=x if()x= u ruv=x if()x= wrw=x v if()x= u ruv=x wrw=x v y= if()x= (ruv)=fu;vg. Originalprogram, (rw)=fwgand mayreadthevalue producedbyu. Wrongexpansionwhen ruv=y if()y= movingutothetop:rw Correctwhen assigningyinuandv...figure5.33.strangeinterplayofconstraintandcoloringrelations... andmovingutothe top. coloringrelation,andisdenedby Wethusbuildanewrelationoverwrites,builtfromand.Itiscalledtheconstraint e(u)=fexp Toavoidthispitfall,coloringrelationmustbeusedwithcare:onemaysafelyset e(v)whenforallu0u,v0v:u0v0(i.e.u0andv0sharethesamecolor). Wecanrewritethisdenitionusingalgebraicoperations: 8v;w2W:vwdef ()vw_ 8v0;w0:v0v^w0w=)v0w0: (5.34) andareequivalencerelations.moreover,choosing(v)=(w)whenvwand Thegoodthingisthatrelationisanequivalence:theproofissimplesinceboth =[ n((ww)n): (5.35) constraintandtheexpansioncorrectnesscriterion. (v)6=(w)whenitsnotthecaseensuresthatfexp Thefollowingresultsolvestheconstraintstoragemappingoptimizationproblem:20 e =(fe;)satisesboththeexpansion Theorem5.4Storagemappingfexp 8v;w2W;vw:vw()(v)=(w) e oftheform(fe;)suchthat constrainedbyandallowstheparallelexecutionorder<partopreservetheprogram istheminimalstoragemapping i.e.accessesthefewermemorylocations whichis (5.36) assignthesamememorylocation. Proof:FromProposition5.3,wealreadyknowthatandhaveanemptyintersection.Togetherwiththeinclusionofn((WW)n)into,thisproves semantics,andbeingtheonlyinformationaboutpermittingtwoinstancesto [. thecorrectnessoffexp Toprovetheoptimalityresult,onerstobservethatdenesanequivalencerelation ofwriteinstances,andsecondthatisthelargestequivalencerelationincludedin e=(fe;).theconstraintisalsoenforcedbyfexp esince. aparallelexecutionorderandapredenedexpansionconstraint.figure5.34givesan Theorem5.4givesusanautomaticmethodtominimizememoryusage,accordingto 20SeeSection2.4.4forageneralremarkaboutoptimality.
215 intuitivepresentationofthiscomplexresult:startingfromthe\maximalconstrained 214 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION correctexpansion",beforecombiningtheresultwiththeconstrainttogeta\minimal expansion",wecomputeaparallelexecutionorder,fromwhichwecomputea\minimal correctconstrainedexpansion".... Single-assignmentform Constrainedexpansion Correctoptimizedexpansion (scheduling,tiling,etc.) Data-owexecutionorder (storagemappingoptimization) Correctparallelexecutionorder Expansion Parallelism <seq <par Originalstoragemapping Sequentialprogram...Figure5.34.Howweachieveconstrainedstoragemappingoptimization... insection5.4.3intothesystem: Asasummaryoftheoptimizationproblem,onemaygrouptheformalconstraintsexposed Algorithm 8><>: 8v;w2W:vw^vw=)(v)=(w) Constraintsonfexp vw^vw=)(v)6=(w) e =(fe;): Figure5.35showstheacyclicgraphallowingcomputationofrelationsandmappings 8({1;r1);({2;r2)2A:({1;r1)exp({2;r2)=){1<par{2 Constraintson<par: involvedinthissystem. rewrittentohandleconstrainedexpansion.beforeapplyingconstrained-storage- Mapping-Optimization,wesupposethatparallelexecutionorder<parhasbeencom- withanextensionofthepartialexpansionalgorithmpresentedinsection5.3.4, ThealgorithmtosolvethissystemisbasedonTheorem5.4.Itcomputesrelation Then,thisparallelexecutionorderisusedtocomputetheexpansioncorrectnesscriterion putedfrom<seq,,,and,byrstcomputingdependencerelationexpthenap- plyingsomeappropriateparallelordercomputationalgorithm(scheduling,tiling,etc.)..algorithmconstrained-storage-mapping-optimizationreusescompute- RepresentativesandEnumerate-RepresentativesfromSection intorenameddatastructurestoimproveperformanceandreducememoryusage. hasbeenproducedbyatilingtechnique,wehavealreadypointedinsection5.3.6that AsinthelastparagraphofSection5.2.4,onemayconsidersplittingexpandedarrays Eventually,whenthecompilerortheuserknowsthattheparallelexecutionorder<par
216 5.4.CONSTRAINEDSTORAGEMAPPINGOPTIMIZATION Programanalysis Program(<seq;fe) Expansionscheme <seq Programanalysis exp Section5.4.5 <parscheduling,etc. Coloration Enumerationofequivalenceclasses...Figure5.35.Solvingtheconstrainedstoragemappingoptimizationproblem... f0e=(fe;)andcodegenerationfor(<par;f0e) onemaybuildavectorofeachdimensionsize,anduseitasa\suggestion"forablockcyclicstoragemapping.thisvectorofblocksizesisusedwhenreplacingthecallto thecyclicgraphcoloringalgorithmisnotecientenough.ifthetileshapeisknown, Storage-Mapping-Optimization. Cyclic-ColoringwithacalltoNear-Block-Cyclic-ColoringinConstrainedbutthisdoesnotmeanleavingtheusercomputerelation! Ourgoalhereisnottochoosetherightconstraintsuitabletoexpandagivenprogram, BuildingExpansionConstraints Section5.2). =R.Theconstraintisthusbuiltfrominstancewisereachingdenitionresults(see Anotherexampleisprivatization,seenasexpansionalongsomesurroundingloops, AsshowninSection5.4.2,enforcingtheexpansiontobestaticcorrespondstosetting thecomponentsassociatedwithprivatizedloops: withoutrenaming.considertwoaccessesuandvwritingintothesamememorylocation. Afterprivatization,uandvassignthesamelocationiftheiriterationvectorscoincideon whereiter(u)[privatizedloops]holdscountersofprivatizedloopsforinstanceu. uv()iter(u)[privatizedloops]=iter(v)[privatizedloops];
217 Constrained-Storage-Mapping-Optimization(program;;;;<par) 216program:anintermediaterepresentationoftheprogram CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION :theconictrelation :thereachingdenitionrelation,seenasafunction 1 returnsanintermediaterepresentationoftheexpandedprogram :theexpansionconstraint 2 <par:theparallelexecutionorder 3 Cyclic-Coloring(\) [ ((R)W)\par\<seq[ par\((par\<seq)) ((R)W)\par\>seq[ par\((par\<seq)) foreacharrayA2program Compute-Representatives(\) Enumerate-Representatives(;) [(n((ww)n)) doA declarationa[shape] foreachstatementsassigningainprogram doleft-handsidea[subscript]ofs component-wisemaximumof(u)forallwriteaccessesutoa Aexp[shape,A] do=ref foreachreferencereftoainprogram quast \(Iref) Make-Quast(=ref) Aexp[subscript,(CurIns)] returnprogram map ref map(curins) CSMO-Convert-Quast(quast;ref) CSMO-Convert-Quast(quast;ref) quast:thequastrepresentationofthereachingdenitionfunction ref:theoriginalreference 31switch 2returnstheimplementationofquastasavalueretrievalcodeforreferenceref casequast=f?g: 654 casequast=f{g: A returnref 78 S x Iter({) Stmt({) Array({) casequast=f{1;{2;:::g: returnaexp[subscript,x] return(f{1;{2;:::g) originalarraysubscriptinref casequast=ifpredicatethenquast1elsequast2: returnifpredicatecsmo-convert-quast(quast1;ref) BuildingtheconstraintforarraySSAisevensimpler.Instancesofthesamestatement elsecsmo-convert-quast(quast2;ref) assigningthesamememorylocationmuststilldosointheexpandedprogram(only variablerenamingisperformed): uv()stmt(u)=stmt(v)
218 5.4.CONSTRAINEDSTORAGEMAPPINGOPTIMIZATION Now,rememberwehavedenedanextensionofreachingdenitions,calledreaching 217 denitionsofmemorylocations.thisdenitioncanbeusedtoweakenthestaticexpansionconstraint:iftheaimofconstrainedexpansionistoreducerun-timeoverheaddue functionsgeneratedbytheclassicalalgorithmhavedisappeared,seethesecondmethod ifloop-nests-ml-saisusedtoconvertaprogramtosaform,wehaveseenthat insection5.1.4.itwouldthusbeinterestingtoreplace tofunctions,thenmlseemsmoreappropriatethantodenetheconstraint.indeed, inline14ofconstrained-storage-mapping-optimizationby Make-Quast(ml Make-Quast(=ref) andtoconsidertheconstraintdenedbythetransitiveclosureofrelationw 8v;w2W: vww()9c2f(u):v;w2ml(u;c); =ref(u;fe(u))) weakenedstaticexpansionwithstoragemappingoptimization. constraintwiscalledweakenedstaticexpansion.eventually,setting=wcombines wherefissomeconservativeapproximationoffe.maximalexpansionaccordingto peciallyarchitecturedependent(numberofprocessors,memoryhierarchy,communication expressedasconstraints statement-by-statement,user-dened,knowledge-based,andes- ofanexpansionstrategyisnotdicult.newexpansionstrategiesshouldbedesignedand Thesepracticalexamplesgivetheinsightthatbuildingfromtheformaldenition model)constraints. Lefebvrein[LF98],andthecoreoftheirsolutionhasbeenrecalledinSection OurgraphcoloringproblemisalmostthesameastheonestudiedbyFeautrierand Graph-ColoringAlgorithm generation.aneasywork-aroundwouldbetoredesigntheoutputofalgorithmstorage- Mapping-Optimization,asproposedin[Coh99b]:letStmt(u)(resp.Iter(u))bethe statement(resp.iterationvector)associatedwithaccessu,andletnewarray(s)be However,theformulationisslightlydierentnow:itisnolongermixed-upwithcode thenameofthenewarrayassignedbys(afterpartialexpansion), 8v;w2W:vwdef ()NewArray(Stmt(v))=NewArray(Stmt(w)) forgraphdenedbyanerelations:cyclic-coloringisusedonstatementinstances Thissolutionissimplebutnotpractical.Wethuspresentafullalgorithmsuitable ^ Iter(v)modEStmt(v)=Iter(w)modEStmt(w): forourstoragemappingoptimizationpurposes.sincethealgorithmisgeneralpurpose, algorithmforstatementinstancesrequiresapreliminaryencodingofstatementname insidetheiterationvector,andapaddingofshortvectorswithzeroes.wealreadyuse thistechniquewhenformattinginstancestotheomegasyntax:seesection5.2.7fora weconsideraninterferencerelationbetweenvectors(ofthesamedimension).usingthis practicalexample. techniques:buildingofanexpansionvectorandpartialrenaming.thisdecompositioncamefromtheboundedstatementnumberwhichallowedecientgreedycoloring RememberthatStorage-Mapping-Optimizationwasbasedontwoindependent
219 techniques,andtheinnityofiterationvectorswhichrequiredaspeciccycliccoloring. 218 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION sidered:ifthevectorsrelatedwithaninterferencerelationhavesomedimensionswhose Cyclic-Coloringproceedsinaverysimilarway,andthereasoningofSection5.3.5and componentsmayonlytakeanitenumberofvalues,itisinterestingtoapplyaclassical twocoloringstagesisextendedhereinconsideringallnitedimensionsofthevectorscon- [LF98,Lef98]isstillapplicabletoproveitscorrectness.However,thedecompositioninto statementinstances,itisclearthatthelastdimensionisnite,butsomeexamplesmay coloringalgorithmtothesenitedimensions.wethenbuildanequivalencerelationof vectorsthatsharethesamenitedimensions:itiscalledfiniteinthecyclic-coloring compiletime.thisextensionmaythusbringmoreecientstoragemappingsthatthe presentmorenitedimensions,forexamplewithsmallloopswhoseboundsareknownat algorithm(thenumberofequivalenceclassesisobviouslynite).whenvectorsencode Storage-Mapping-OptimizationalgorithminSection Cyclic-Coloring() 2finite 1N returnsavalidandeconomicalcycliccoloration :theaneinterferencegraph dimensionofvectorsrelatedwithinterfere 53foreachclasssetinfinite 4doforp=1toN doworking equivalencerelationofvectorssharingthesamenitecomponents 678 f(v;w):v2set^w2set 9 maxv vector[p+1] f(v;max<lexfw:(v;w)2workingg)g ^v[1::p]=w[1::p]^v[1::p+1]<w[1::p+1] ^hs;vihs;wig 12foreachset;set0infinite 11interfere 10 cyclicset? vmodvector max<lexfw v[p+1]+1:(v;w)2maxvg 15coloring 16col 14 13doif(9v2set;v02set0:vv0) 17foreachsetinfinite theninterfere? Greedy-Coloring(interfere) interfere[f(set;set0)g 18docol 19returncol TheNear-Block-Cyclic-ColoringalgorithmisanoptimizationofCyclic- col[(cyclicset;coloring(set)) bolisusedforsymbolicintegerdivision.theintuitiveideaisthatablock-cycliccoloring associatedwithtiledprograms,ashintedinsection5.3.6.inthisparticularcase,we consider asinmosttilingtechniques aperfectlynestedloopnest.noticethe\="sym- Coloring:itincludesanimprovementofthetechniquetoecientlyhandlegraphs tocomputeoptimizedstoragemappingsfortiledprograms.asshowninsection5.3.6, theblock-cycliccoloringproblemisstillopenforaneinterferencerelations. ispreferedtothecycliconeoftheclassicalalgorithm. TheNear-Block-Cyclic-Coloringalgorithmshouldbeseenasarstattempt
220 Near-Block-Cyclic-Coloring(;shape) 5.4.CONSTRAINEDSTORAGEMAPPINGOPTIMIZATION :asymbolicinterferencegraph 219 1N returnsavalidandeconomicalblock-cycliccoloration shape:avectorofblocksizessuggestedbyatilingalgorithm 4doquotient0 3forp=1toN 2quotient numberofnestedloops 5 f(x;x):x2zng 76 if(@z:zquotient0quotient0 1z) thenquotient f(x;y):y[1]=x[1];:::;y[p]=x[p]=shapep;:::;y[n]=x[n]g quotient 9returncolquotient 8col Cyclic-Coloring(quotientquotient 1) AsinSection5.3.8,-arraysshouldbechoseninone-to-onemappingwiththeexpanded datastructures,andargumentsoffunctions i.e.setsofpossiblereachingdenitions DynamicRestorationoftheData-Flow usedtorecomputethesetsofpossiblereachingdenitions:21a(set)referenceshouldbe replacedby fv2set:@w2set:v<seqw^:(v6w)^(v)=(w)g: shouldbeupdatedaccordingtothenewstoragemapping.thetechniqueisessentially thesame:functionfexp e isusedtoaccess-arrays,thenrelation6andfunctionare subscript,andthebooleantypeisnowpreferredfor-arrayselements.thisverysimple memorylocationwrittenbyapossiblereachingdenitioncanbededucedfromthearray optimizationreducesbothmemoryusageandrun-timeoverhead.algorithmcsmo- Anotheroptimizationisbasedontheshapeof-arrays:sincefexp e =(fe;),the Implement-Phisummarizestheseoptimizations.22 onlinecomputationoffunctionsisratherdierent. functionsinthessaframework[cfr+91,ks98].however,codegenerationforthe tionofthedataow.ourtechniqueextendsideasfromthealgorithmstoecientlyplace AshintedinSection5.1.4,thegoalisnowtoavoidredundancyintherun-timerestora- insection2.3.1,andaprogrampointisaninter-statementlocationintheprogramtext mergetogether.rememberthecontrol-owgraphisnotthecontrolautomatondened graph[cfr+91]:thereisajoinatsomeprogrampointwhenseveralcontrol-owpaths AsintheSSAframework,functionsshouldbeplacedatthejoinsofthecontrol-ow toausewhosesetofpossiblereachingdenitionsisnonemptyandholdsw.ifpoints [ASU86].Ofcourse,textualorder<txtisextendedtoprogrampoints. details.indeed,theonly\interesting"joinsarethoselocatedonapathfromawritew isthesetofprogrampoints,thesetof\interesting"joinsforanarray(orscalar)ais Joinsareecientlycomputedwiththedominancefrontiertechnique,see[CFR+91]for Section5.4.4.Tocorrectlyhandlethispartitioning,somesimple butrathertechnical modications location. 21Weuse:(v6w)toapproximatetherelationbetweenwritesthatmustassignthesamememory shouldbemadeonthealgorithm. 22Foreciencyreasons,anexpandedarrayAexpispartitionedintoseveralsub-arrays,asproposedin
221 CSMO-Implement-Phi(expanded) 220expanded:anintermediaterepresentationoftheexpandedprogram CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION 2doiftherearefunctionsaccessingAexp 31foreacharrayAexp[shape]inexpanded returnsanintermediaterepresentationwithrun-timerestorationcode 456 thendeclareanarrayaexp[shape]initializedtofalse foreachreadreferencereftoaexpwhoseexpandedformis(set) 87 dosub short dorefs foreachstatementsinvolvedinset arraysubscriptinref 109 subs writereferenceins ifnotalreadydonefors thenfollowingsinsert arraysubscriptinrefs 13 14returnexpanded (set) Aexp[max<seqf{2short:Aexp[sub,({;ref)]=trueg] Aexp[subs,(CurIns;refs)]=true denotedbyjoinsa,andisformallydenedby 8p2Points:p2JoinsA()9v;u2I: jinjoinsaapseudo-assignmentstatement Foreacharray(orscalar)Aintheoriginalprogram,theideaistoinsertateachjoin vu^stmt(v)<txtp<txtstmt(u)^array(stmt(u))=a:(5.37) extendedtothesepseudo-assignmentstatementsandtheconstraintstorage-mappingoptimizationprocessisperformedonthemodiedprograminsteadoftheoriginalone.23 PjA[]=A[]; whichcopiestheentirestructureintoitself.then,thereachingdenitionrelationis ApplicationofConstrained-Storage-Mapping-OptimizationandthenCSMO- Implement-Phi(oranoptimizedversion,seeSection5.1.4)generatesanexpandedprogramwhoseinterestingpropertyistheabsenceofanyredundancyinfunctions.Indeed, thelexicographicmaximumoftwoinstancesisnevercomputedtwice,sinceitisdoneas OptimizationandCSMO-Implement-Phi.KnobeandSarkarencounterasimilar whichwasnotthecaseforadirectapplicationofconstrained-storage-mapping- earlyaspossibleinthefunctionofsomepseudo-assignmentstatement. problemwithssaforarrays[ks98]andproposeseveraloptimizations(mostlybased However,theexpandedprogramsuersfromtheoverheadinducedbyarraycopying, Nevertheless,thereissuchageneralmethod,basedontheobservationthateachpseudoassignmentstatementintheexpandedprogramisfollowedbyan-arrayassignation,by toremovearraycopies{itistheverynatureofssatogeneratetemporaryvariables. oncopypropagationandinvariantcodemotion),buttheyprovidenogeneralmethod codegenerationforapseudo-assignmentstatementp: constructionofpseudo-assignmentstatementsandthesetjoinsa.considerthefollowing reachingdenitionsforpseudo-assignmentaccessescanbededucedfromtheoriginalreachingdenition relation. 23Extendingthereachingdenitionrelationdoesnotrequireanyotheranalysis:thesetsofpossible for(){//iteratethroughthewholearray
222 P5.4.CONSTRAINEDSTORAGEMAPPINGOPTIMIZATION Aexp[subscript]=Aexp[max(set)]; 221 StatementPdoesnotcomputeanything,itonlygatherspossiblevaluescomingfrom } Aexp[subscript]=true; dierentcontrolpaths.theideaisthustostoreinstancesinsteadofbooleansandtouse sideofp.thepreviouscodefragmentcanthussafelybereplacedby: for(){//iteratethroughthewholearray ThistechniquetoremovespuriousarraycopiesisimplementedinCSMO-Efficiently- }@Aexp[subscript]=max(set); Implement-Phi:theoptimizedgenerationcodealgorithmforfunctions.Remember CSMO-Efficiently-Implement-Phi(expanded) shouldbeappliedontheoriginalprogramextendedwithpseudo-assignmentstatements.24 thatbeforecallingthisalgorithm,constrained-storage-mapping-optimization 2doiftherearefunctionsaccessingAexp 1foreacharrayAexp[shape]inexpanded expanded:anintermediaterepresentationoftheexpandedprogram 3returnsanintermediaterepresentationwithrun-timerestorationcode 456 thendeclareanarray@aexp[shape]initializedto? foreachreadreferencereftoaexpwhoseexpandedformis(set) 87 dosub short dorefs foreachstatementsinvolvedinset arraysubscriptinref fv2set:@w2set:v<seqw^:(v6w)^(v)=(w)g 10 9 subs writereferenceins ifnotalreadydonefors thenfollowingsinsert arraysubscriptinrefs foreachpseudo-assignmentptoaexpwithreference(set) dogenmax Aexp[max<seqf{2short:@Aexp[sub,({;ref)]g] 18returnexpanded removestatementp right-handsideof-arrayassignmentfollowingp genmax bynextjointhenextinstanceofthenearestpseudo-assignmentstatementfollowing arithmetics isawellknownproblemwithveryecientparallelimplementations[rf94]. butitiseasierandsometimesfastertoperformanonlinecomputation.letusdenote Eventually,computingthelexicographicmaximumofaset denedinpresburger inreplacingeachassignmentoftheform CurIns.Computationofthelexicographicmaximumin(set)canbeperformedonline
223 222 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION (isdenedforinstancesofnextjoin:itisapseudo-assignmenttoa). tivatingexampleyieldsthesameresultasthesaforminfigure5.28. ApplyingCSMO-Efficiently-Implement-Phiandthistransformationtothemo Thissectionaimstocharacterizecorrectparallelexecutionordersforaprogramafter maximalconstrainedexpansion.thebenetmemoryexpansionistoremovespurious ParallelizationafterConstrainedExpansion dependencerelationoftheexpandedprogramwithsequentialexecutionorder(<seq;fexp dependencesduetomemoryreuse,butsomememory-baseddependencesmayremainafter constrainedexpansion.westilldenotebyexp AsannouncedinSection5.4.3,wenowgivethefullcomputationdetailsfor(5.29). Dependencesleftbyconstrainedexpansionare,asusual,ofthreekinds. e(resp.exp)theexact(resp.approximate) e). 1.Outputdependencesduetowritesconnectedtoeachotherbytheconstraint(e.g. 2.Truedependences,fromadenitiontoaread,wherethedenitioneithermayreach thereadorisrelated(by)toadenitionthatreachestheread. byrinthecaseofmse). Formally,wethusdeneexp 3.Antidependencesfromareadtoadenitionwherethedenition,evenifitexecutes 8e2E;8v;w2Ae:vexp aftertheread,isrelated(by)toadenitionthatreachestheread. eforanexecutione2easfollows: ew() _fe(v)=fe(e(w))^ve(w)^v<seqw _fe(v)=fe(w)^vw^v<seqw vw Then,thefollowingdenitionofexpisthebestpessimisticapproximationofexp posingrelationisthebestavailableapproximationoffunctionfeandisthebest _fe(w)=fe(e(v))^e(v)w^v<seqw availableapproximationoffunctione: e,sup- 8v;w2A:vexpwdef ()_ 9u2W:uw^vu^vu^v<seqw(5.40) _vw^vw^v<seqw vw (5.39) (5.38) Now,sinceandarereexiverelations,weobservethat(5.38)isalreadyincludedin (5.40).Wemaysimplifythedenitionofexp: _ 9u2W:uv^uw^uw^v<seqw(5.41) 8v2W;w2R:vexpw, 9u2W:uw^vu^vu^v<seqw 8v2R;w2W:vexpw, 9u2W:uv^uw^uw^v<seqw 8v;w2W:vexpw,vw^vw^v<seqw (5.42)
224 Eventually,wegetanalgebraicdenitionofthedependencerelationaftermaximalconstrainedexpansion: 5.4.CONSTRAINEDSTORAGEMAPPINGOPTIMIZATION 223 (includingreachingdenitions),andthethirdonedescribesanti-dependences. Thersttermdescribesoutputdependences,thesecondonedescribesowdependences exp =(\)[(\)[ 1(\): (5.43) <paraftermaximalconstrainedexpansion.practicalcomputationof<parisdonewith schedulingortilingtechniques,seesection Usingthisdenition,Theorem2.2page81describescorrectparallelexecutionorder constraintistheoneofthemaximalstaticexpansion.first,wedenethesequential executionorder<seqwithinomega(withconventionsdenedinsection5.2.7): Asanexample,weparallelizetheconvolutionprograminFigure5.6(page169).The Lex:={[i,w,2]->[i',w',2]:1<=i<=i'<=N&&1<=w,w'&&(i<i' w<w')} union{[i,0,1]->[i',w',2]:1<=i<=i'<=n&&1<=w'} union{[i,w,2]->[i',0,1]:1<=i,i'<=n&&1<=w&&i<i'} union{[i,0,3]->[i',0,3]:1<=i<i'<=n} union{[i,0,1]->[i',0,3]:1<=i<=i'<=n} union{[i,0,3]->[i',0,1]:1<=i<i'<=n} union{[i,0,1]->[i',0,1]:1<=i<i'<=n} structureisascalarvariable),andthatrelationrisdenedby(5.12).wecomputeexp Second,recallfromSection5.2.7thatallwritesareinrelationfor(sincethedata union{[i,w,2]->[i',0,3]:1<=i<=i'<=n&&1<=w} union{[i,0,3]->[i',w',2]:1<=i<i'<=n&&1<=w'}; D; from(5.43): {[i,w,2]->[i,w',2]:1<=i<=n&&1<=w<w'}union D:=(RunionR(S)unionS'(R))intersectionLex; {[i,0,1]->[i,w',2]:1<=i<=n&&1<=w'}union {[i,0,1]->[i,0,3]:1<=i<=n}union {[i,w,2]->[i,0,3]:1<=i<=n&&1<=w} i.itmakestheouterloopparallel(itwasnotthecasewithoutexpansionofscalarx). TheparallelprograminmaximalstaticexpansionisgiveninFigure5.14.b. AfterMSE,itonlyremainsdependencesbetweeninstancessharingthesamevalueof UsingtheOmegaCalculatortext-basedinterface,wedescribeastep-by-stepexecution oftheexpansionalgorithm.wehavetocodeinstancesasinteger-valuedvectors.an BacktotheMotivatingExample instancehs;iiisdenotedbyvector[i,..,s],where[..]possiblypadsthevectorwith written[i,j,0,1],[i,j,k,2]and[i,0,0,3],respectively. zeroes.wenumbert,s,rwith1,2,3inthisorder,soht;i;ji,hs;i;j;kiandhr;iiare S:={[i,0,0,3]->[i,j,k,2]:1<=i,j<=M&&1<=k<=N} TheresultofinstancewisereachingdenitionanalysisiswritteninOmega'ssyntax: union{[i,j,k,2]->[i,j,k-1,2]:1<=i,j<=m&&2<=k<=n}; union{[i,j,1,2]->[i,j,0,1]:1<=i,j<=m}
225 224Theconictandno-conictrelationsaretrivialhere,sincetheonlydatastructureis CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION ascalarvariable:isthefullrelationand6istheemptyone. Con:={[i,j,k,s]->[i',j',k',s']:1<=i,i',j,j'<=M&&1<=k,k'<=N NCon:={[i,j,k,s]->[i',j',k',s']:1=2};#1=2meansFALSE! &&((s=1&&k=0) s=2 (s=3&&j=k=0)) AsinSection5.4.1,wechoosestaticexpansionasconstraint.Relationisthus &&((s'=1&&k'=0) s'=2 (s'=3&&j'=k'=0))}; R:=S(S'); S':=inverseS; denedasrinsection5.2.2: dependencesisdoneaccordingto(5.43)andrelationconisremovedsinceitalwaysholds: D:=RunionR(S)unionS'(R); NotransitiveclosurecomputationisnecessarysinceRisalreadytransitive.Computing closurecomputation: Par:=D+; Inthiscase,asimplesolutiontocomputingaparallelexecutionorderisthetransitive #The"full"relation callitint. Wecannowcomputerelation./inleft-handsideoftheexpansioncorrectnesscriterion, Full:={[i,j,k,s]->[i',j',k',s']:1<=i,i',j,j'<=M&&1<=k,k'<=N #Thesequentialexecutionorder &&((s=1&&k=0) s=2 (s=3&&j=k=0)) Lex:={[i,j,0,1]->[i',j',0,1]:1<=i<i'<=M&&1<=j,j'<=M} &&((s'=1&&k'=0) s'=2 (s'=3&&j'=k'=0))}; union{[i,j,0,1]->[i',j',k',2]:1<=i<=i'<=m&&1<=j,j'<=m union{[i,j,k,2]->[i',j',k',2]:1<=i<=i'<=m&&1<=j,j'<=m union{[i,j,k,2]->[i',j',0,1]:1<=i<i'<=m&&1<=j,j'<=m &&1<=k<=N} &&1<=k'<=N} union{[i,j,k,2]->[i',0,0,3]:1<=i<=i'<=m&&1<=j<=m union{[i,0,0,3]->[i',j',0,1]:1<=i<i'<=m} union{[i,j,0,1]->[i',0,0,3]:1<=i<=i'<=m} &&1<=k,k'<=N&&(i<i' (j<=j'&&(j<j' k<k')))} union{[i,0,0,3]->[i',j',k',2]:1<=i<i'<=m&&1<=j'<=m &&1<=k<=N} ILex:=inverseLex; union{[i,0,0,3]->[i',0,0,3]:1<=i<i'<=m}; &&1<=k'<=N} INPar:=inverseNPar; NPar:=Full-Par;
226 Int:=(INParintersectionILex) 5.4.CONSTRAINEDSTORAGEMAPPINGOPTIMIZATION 225 Int:=Intunion(inverseInt); Theresultis: union(inparintersections(nparintersectionlex)); Int; {[i,j,k,2]->[i',j',k',2]:1<=j<=j'<=m {[i,j,k,2]->[i',j',k',2]:1<=j<j'<=m {[i,j,k,2]->[i',j,k',2]:1<=k'<k<=n &&1<=k<=k'<=N&&1<=i'<i<=M}union {[i,j,1,2]->[i',j',1,2]:n=1 &&1<=k'<k<=N&&1<=i'<i<=M}union {[i,j,k,2]->[i',j',k',2]:1<=k<=k'<=n &&1<=i'<i<=M&&1<=j<=M}union {[i,j,k,2]->[i',j',k',2]:1<=k'<k<=n &&1<=i'<i<=M&&1<=j'<j<=M}union {[i,j,k,2]->[i',j,k',2]:k'-1,1<=k<=k' &&1<=i'<i<=M&&1<=j'<j<=M&&2<=N}union {[i,j,k,2]->[i',j',k',2]:1,k'-1<=k<=k' &&1<=i'<i<=M&&1<=j'<j<=M}union {[i,j,k,2]->[i',j',k',2]:1<=i<i'<=m &&1<=i<i'<=M&&1<=j<=M&&k<N}union {[i,j,k,2]->[i',j',k',2]:k'-1,1<=k<=k' &&1<=i<i'<=M&&1<=j<j'<=M&&k<N}union {[i,j,k,2]->[i',j',k',2]:k-1,1<=k'<=k &&1<=j<j'<=M&&1<=k'<k<N}union {[i,j,k,2]->[i',j',k',2]:1<=k<k'<n &&1<=i<i'<=M&&1<=j'<j<=M&&k<N}union {[i,j,k,2]->[i',j',k',2]:1,k-1<=k'<=k &&1<=j<j'<=M&&1<=i'<i<=M&&k'<N}union {[i,j,k,2]->[i',j,k',2]:k-1,1<=k'<=k &&1<=i'<i<=M&&1<=j'<j<=M}union &&1<=i'<i<=M&&1<=j<=M&&k'<N}union &&1<=i'<i<=M&&1<=j'<j<=M&&k'<N}union {[i,j,k,2]->[i',j',k',2]:1<=i<i'<=m {[i,j,1,2]->[i',j',1,2]:n=1&&1<=i<i'<=m &&1<=j<j'<=M&&1<=k<k'<=N}union {[i,j,k,2]->[i',j,k',2]:1<=i<i'<=m &&1<=j<j'<=M&&1<=k'<=k<=N&&2<=N}union &&1<=j<j'<=M}union {[i,j,k,2]->[i',j',k',2]:1<=i<i'<=m &&1<=k<k'<=N&&1<=j<=M}union &&1<=k<k'<=N&&1<=j'<j<=M}union &&1<=k'<=k<=N&&1<=j'<=j<=M}
227 226Aquickvericationshowsthat CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION and Intintersection{[i,j,0,1]->[i,j,k',2]:k'!=0}; Intintersection{[i,j,k,2]->[i,j,k',2]}; theiloop),wt1(v),ws1(v)(forthejloop)holdallaccesseswexecutingafterv.then, all1kn(rdoesnotperformanywrite).however,thesetswt0(v),ws0(v)(for dierentiorjenforcesdierentcolorforht;i;jiandhs;i;j;ki.applicationofthegraph arebothempty.itmeansthatht;i;jiandhs;i;j;kishouldsharethesamecolorfor Col:={[i,j,0,1]->[i,j,k,2]:1<=i,j<=M&&1<=k<=N} coloringalgorithmthusyieldsthefollowingdenitionofthecoloringrelation: Eco:=Runion(Col-R(Full-Col(R))); Wenowcomputerelation,thanksto(5.35): union{[i,j,k,2]->[i,j,k',2]:1<=i,j<=m&&1<=k,k'<=n}; Rho:=Eco-Lex(Eco); (relationalwaysholdsandhasbeenremoved): Wechoosetherepresentativeofeachequivalenceclassasthelexicographicminimum Rho; Theresultis: {[i,j,k,2]->[i,j,0,1]:1<=i<=m&&1<=j<=m&&1<=k<=n} {[i,j,0,1]->[i,j,0,1]:1<=i<=m&&1<=j<=m}union Theresultingfunctionisthus Thelabelingschemeisobvious:thelasttwodimensionsarestrippedofromRho. computedthesamestoragemappingasinfigure5.31. FollowingthelinesofConstrained-Storage-Mapping-Optimization,wehave (ht;i;ji)=(i;j) and (hs;i;j;ki)=(i;j): Thistopichasreceivedlittleinterestfromthecompilationcommunity,butthesituation 5.5 Thelastcontributionofthisworkisaboutautomaticparallelizationofrecursiveprograms. ParallelizationofRecursivePrograms isevolvingthankstonewpowerfulmulti-threadedenvironmentsforecientexecution ofprogramswithcontrolparallelism.whendealingwithshared-memoryarchitectures andsoftware-emulatedsharedmemorymachines,toolslikecilk[mf98]provideavery suitableprogrammingmodelforautomaticorsemi-automaticcodegeneration[rr99]. ititstillanopenproblemtocomputeaschedulefromadependencerelationdescribed byatransducer.thisisofcourseastrongargumentagainstdataparallelismasamodel ofchoiceforparallelizationofrecursiveprograms.moreover,wehaveseeninsection1.2 Now,whatprogrammingmodelshouldweconsiderforparallelcodegeneration?First,
228 5.5.PARALLELIZATIONOFRECURSIVEPROGRAMS thatthecontrolparallelparadigmwaswellsuitedtoexpressparallelexecutioninrecursiveprograms.infact,thisassertionistruewhenmostiterativecomputationsare implementedwithrecursivecalls,butnotwhenparallelismislocatedwithiniterationsof 227 parallelisminthefollowing. aloop.sinceloopscanberewrittenasrecursiveprocedurecalls,wewillsticktocontrol algorithmforrecursivestructureshasbeenproposedyet.wethusstartwithaninvestigationofspecicaspectsofexpandingrecursiveprogramsandrecursivedatastructuresin Section5.5.1.ThenwepresentinSection5.5.2asimplealgorithmforsingle-assignment Noticewehavestudiedpowerfulexpansiontechniquesforloopnests,butnopractical formconversionofanycodethattintoourprogrammodel:thealgorithmcanbeseenas apracticalrealizationofabstract-sa,theabstractalgorithmforsa-formconversion tion5.5.4;andsomepracticalexamplesarestudiedinsection5.5.5.wealsogivesome (page157).then,aprivatizationtechniqueforrecursiveprogramsisproposedinsec- perspectivesaboutextendingmaximalstaticexpansionorstoragemappingoptimization tion5.5.6startswithashortstateoftheartonparallelizationtechniquesforrecursive programs,thenmotivatesthedesignofanewalgorithmbasedoninstancewisedata- tothislargerclassofprograms. owinformation.insection5.5.7,wepresentanimprovementofthestatementwise Therestofthissectionaddressesgenerationofparallelrecursiveprograms.Sec- statements butitisstilldecidedatcompile-time.thistechniqueisalsocompletelynovel inparallelizationofrecursiveprograms. statementsexecuteinparallelorinsequencecanbedependentontheinstanceofthese algorithmwhichallowsinstancewiseparallelizationofrecursiveprograms:whethersome BeforeproposingageneralsolutionforSA-formconversionofrecursiveprograms,we investigateseveralissueswhichmaketheproblemmoredicultforrecursivecontroland ProblemsSpecictoRecursiveStructures datastructures.recallthatelementsindatastructuresinsingle-assignmentformare inone-to-onemappingwithcontrolwords.thus,thepreferredlayoutofanexpanded Butautomaticrecognitionofsuchprogramsandeectivedesignofaspecicexpansion whenloopsandrecursivecallsarenot\interleaved" programqueensissuchanexample. arrays:itisthecasewhenonlyloopsandsimplerecursiveproceduresareinvolved,and datastructureisatree.expandeddatastructurescansometimesbeimplementedwith structuresaretreeswhoseedgesarelabeledbystatementnames. ManagementofRecursiveData-Structures techniqueareleftforfuturework.wewillthusalwaysconsiderthatexpandeddata refertheaccessofanelementindexbyword{inadatastructuredexp.butwhendexpis Abstract-SA(page157)forSA-formconversionusesthenotationDexp[CurIns]to areindeednotrandomaccessdatastructures.forexample,theabstractalgorithm Comparedtoarrays,listsandtreesseemsmuchlesseasytoaccessandtraverse:they atree,whatdoesitmean?howisitimplemented?isitecient? controlword.its\evolution"duringprogramexecutionisfullypredictable:itcanbeseen time.amorecleveranalysisshowsthatcurinsisnotarandomword:itisthecurrent pointerdereferencesalonglettersincurins,theresultisofcourseverycostlyatrun- Thereisaquickanswertoallthesequestions:thetreeistraversedfromitsrootusing
229 asadierentlocalvariableineachprogramstatement,anewletterbeingaddedateach 228 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION techniqueshouldbeusedtoreducetherun-timeoverhead.wethussupposethatan cannotbeallocatedatcompile-timeingeneral,averyecientmemorymanagement blockentry. automaticschemeforgroupingmallocsornewsisimplemented,possiblyatthec-compiler Theotherproblemwithrecursivedatastructuresismemoryallocation.Becausethey oroperatingsystemlevel. CurInsmustbegeneratedbyalgorithmAbstract-SA;eachtimeablockisentered, algorithm.theideaisthefollowing:supposearecursivedatastructureindexedby anewelementofthedatastructureisallocatedandthepointertothelastelement Eventually,bothproblemscanbesolvedwithasimpleandecientcodegeneration storedinalocalvariable isdereferencedaccordingly.thistechniqueisimplementedin Recursive-Programs-SA. recursiveprograms,twokindofproblemsimmediatelyarise: AboutAccuracyandVersatility Whentryingtoextendmaximalstaticexpansionandstoragemappingoptimizationto transductionsarenotasversatileasanerelations,becausesomecriticalalgebraic theresultsofdependenceandreachingdenitionanalysesarenotalwaysasprecise operationsarenotdecidableandrequireconservativeapproximations; Thesetwopointsareofcourselimitingtheapplicabilityof\evolved"expansiontechniques asonewouldexpect,becauseofthelackofexpressivenessofrationalandone-counter whichintensivelyrelyonalgebraicoperationsonsetsandrelations. transductions. lacking,e.g.,theclassofleft-synchronousrelationsisnotclosedundertransitiveclosure. thelexicographicalselectionofaleft-synchronoustransductionisleft-synchronous,see Conversely,theproblemofenumeratingequivalenceclassesseemsrathereasybecause Inaddition,afewcriticaloperationsusefulto\evolved"expansiontechniquesare timalityshouldprobablynothopedfor,evenforrecognizablerelations.graph-coloring algorithmsforrationalrelationswouldofcoursebeusefulforstoragemappingoptimization;butrecallfromsection5.3.2thatmanyalgebraicoperationsareinvolvedinthe Wearenotawareofanyresultaboutcoloringgraphsofrationalrelations,butop- Section3.4.3;aremainingproblemwouldbetolabeltheclassrepresentatives... relations. expansioncorrectnesscriterion,andmostoftheseoperationsareundecidableforrational privatization.butthisproblemismorewiththeprogrammodelrestrictionsthanwith modelandrequireexpansiontechniquesmore\evolved"thansingle-assignmentformor theapplicabilityofstaticexpansionandstoragemappingoptimization. Thelastpointisthatwehavenotfoundenoughcodesthatbothtintoourprogram gorithmloop-nests-saforrecursiveprograms.itworkstogetherwithrecursive- AlgorithmRecursive-Programs-SAisarstattempttogiveacounterpartofal Programs-Implement-Phitogeneratethecodeforfunctions.Expandeddatastructuresallhavethesametype,ControlType,whichisbasicallyatreetypeassociatedwith
230 5.5.PARALLELIZATIONOFRECURSIVEPROGRAMS thelanguagelctrlofcontrolwords.itcanbeimplementedusingrecursivetypesand 229 sub-types,orsimplywithasmanypointereldsasstatementlabelsinctrl.anadditionaleldincontroltypestorestheelementvalue,ithasthesametypeasoriginaldata Recursive-Programs-SA(program;) structureelements,anditiscalledvalue. 1deneatreetypecalledControlTypewhoseelementsareindexedinLctrl returnsanintermediaterepresentationoftheexpandedprogram :areachingdenitionrelation,seenasafunction program:anintermediaterepresentationoftheprogram 2foreachdatastructureDinprogram 43dodeneadatastructureDexpoftypeControlType 765 foreachcalltoaprocedurepinprogram doinsertanewargumentdlocalintherstplace foreachprocedureinprogram deneaglobalpointervariabledlocal=&dexp doinsertdlocal->p=newcontroltype()beforethecall foreachnon-procedureblockbinprogram doinsertdlocal->b=newcontroltype()atthetopofb denealocalpointervariabledlocal=dlocal->b insertanewargumentdlocal->pintherstplace foreachreferencereftodinprogram doleft-handsideofs foreachstatementsassigningdinprogram 17returnprogram doref ((CurIns;ref)) Dlocal->value dierentmemorylocations,i.e.whichcannotbereplacedbyanotherlabelandyieldanotherinstanceofanassignationstatementtotheconsidereddatastructure.appliedtments.byuseless,wemeanstatementlabelswhicharenotusefultodistinguishbetween ControlType,andeverypointerupdatecodeintheassociatedprogramblocksandstate- Asimpleoptimizationtosparememoryconsistsinremovingall\useless"eldsfrom totakebenetofthelocalityofdatastructureusageinprograms. Thisoptimizationshouldofcoursebeappliedonadatastructureperdatastructurebasis, Q,a,andb;allotherlabelsareunnecessarytoenforcethesingle-assignmentproperty. programqueens,onlythreelabelscanbeconsideredtodenetheeldsofcontroltype: denitionsatrun-timeisnotaseasyasinthecaseofloopnests.infact,apartofthe algorithmiseven\abstract":wehavenotdiscussedyethowtheargumentofthecanbe problemforecientcodegeneration,butdetectingexactresultsandcomputingreaching Oneshouldnoticethateveryreadreferencerequiresafunction!Thisisclearlyabig storeaddressesofmemorylocations,computedfromtheoriginalwritereferencesinassignmentstatements.eachfunctionrequiresatraversalof-structurestocomputethe Ofcourse,algorithmRecursive-Programs-Implement-Phigeneratesthecodefor computed.tosimplifytheexposition,alltheseissuesareaddressedinthenextsection. -structuresdexpusingthesametechniquesasthesa-formalgorithm.these-structures exactreachingdenitionatrun-time:themaximumiscomputedrecursivelyfromthe themaximumcanbedoneinparallel,asusualforreductionoperationsontrees. rootofdexp,andtheappropriateelementvalueindexpisreturned.thiscomputationof
231 Recursive-Programs-Implement-Phi(expanded) 230expanded:anintermediaterepresentationoftheexpandedprogram CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION 1foreachexpandeddatastructureDexpinexpanded 2doiftherearefunctionsaccessingDexp 3returnsanintermediaterepresentationwithrun-timerestorationcode 645 thendeneadatastructuredexpoftypecontroltype deneaglobalpointervariabledlocal=&dexp 87 doinsertanewargumentdlocalintherstplace doinsertdlocal->p=newcontroltype()beforethecall foreachcalltoaprocedurepinprogram foreachprocedureinprogram foreachnon-procedureblockbinprogram doinsertdlocal->b=newcontroltype()atthetopofb insertanewargumentdlocal->pintherstplace foreachreadreferencereftodexpwhoseexpandedformis(set) doforeachstatementsinvolvedinset insertdlocal->value=null denealocalpointervariabledlocal=dlocal->b dorefs (set) ifnotalreadydonefors thenfollowingsinsertdlocal->value=&refs {traversedexpanddexpinlexicographicorder writereferenceins 20returnexpanded maxloc->value;} usingpointersdlocalanddlocalrespectively if(dlocal->value==&ref)maxloc=dlocal; Thetreetraversaldoesnotusethesetargumentoffunctionsatall!Indeed, Twoproblemsremainwithfunctionimplementation. testingformembershipinarationallanguageisnotaconstant-timeproblem,and itisevennotlinearingeneralforalgebraiclanguages.thispointisalsorelated Severalfunctionsmayinducemanyredundantcomputations,sincethemaximum nextsection. withrun-timecomputationofsetsofreachingdenitions:itwillbediscussedinthe previousresults.thisproblemwassolvedforloopnestsusingacomplextechnique musteverytimebecomputedonthewholestructure,nottakingbenetofthe integratedwithconstrainedstoragemappingoptimization(seesection5.4.7),but 5.5.3nosimilartechniqueforrecursiveprogramsisavailable. Inthelastsection,allreadaccesseswereimplementedwithfunctions.Thissolution ensurescorrectnessoftheexpandedprogram,butitisobviouslynotthemostecient. GeneratingCodeforReadReferences forloopnests(withthequastrepresentation).sadly,thisisnotaseasyingeneral:some rationalfunctionscannotbecomputedforagiveninputinlineartime,anditiseven exact),wecanhopeforanecientrun-timecomputationofitsvalue,asitisthecase Ifweknowthatthereachingdenitionrelationisapartialfunction(i.e.theresultis worseforalgebraicfunctions.
232 5.5.PARALLELIZATIONOFRECURSIVEPROGRAMS 231 Theclassofsequentialfunctionsisinterestingforthispurpose,sinceitisdecidable andallowsecientonlinecomputation,seesection3.3.3.becauseforeverystateand inputletter,theoutputletterandnextstateareknownunamiguously,wecancompute sequentialfunctionstogetherwithpointerupdatesforexpandeddatastructures.this techniquecanbeeasilyextendedtoasub-sequentialfunction(t;),inaddingthepointer updatesassociatedwithfunction(fromstatestowords,seedenition3.10page100). Theclassofsub-sequentialtransductionsisdecidableinpolynomialtimeamongrationaltransductionsandfunctions[BC99b].Thisonlinecomputationtechniqueisdetailed inalgorithmrecursive-programs-online-sa,forsub-sequentialreachingdention transductions.anextensiontoonlinerationaltransductionwouldalsobepossible,withoutsignicantlyincreasingtherun-timecomputationcost,butdecidabilityisnotknown forthisclass. Dealingwithalgebraicfunctionsislessenthusiastic,becausedecidingwhetheran algebraicrelationisafunctionisratherunlikely,anditisthesamefortheclassofonline algebraictransductions.butsupposingweareluckyenoughtoknowthatanalgebraic transductionisonline(henceapartialfunction),wecanimplementecientlytheruntimecomputation,withthesametechniqueasbefore:thenextstate,outputlabel,and stackoperationisneverambiguous. Asimilartechniquecanbeusedtooptimizethetreetraversalintheimplementationof(set)byalgorithmRecursive-Programs-Implement-Phi.Computinga left-synchronousapproximationofthereachingdenitiontransduction(eveninthecase ofanalgebraictransduction),onemayusetheclosureunderprex-selection(seesection3.4.3andespeciallyproposition3.11)toselectthetopmostnodeindexp[set]and Dexp[set].Thesetopmostnodescanbeusedinsteadoftherootofthetreestoinitiatethe traversal.tobecomputedatrun-time,however,therationalfunctionimplementingthe prex-selectionof(approximateingeneral)mustbesub-sequential.anotherapproach consistsincomputinganapproximationoftheunionofallpossiblesetsofreachingdenitionsinvolvedinagivenfunction.theresultisrational(resp.algebraic)ifthereaching denitiontransductionisrational(resp.algebraic),thankstonivat'stheorem3.6(resp. Evey'sTheorem3.24),anditcanbeusedtorestrictthetreetraversaltoasmallerdomain. Bothapproachescanbecombinedtooptimizethefunctionimplementation. Toconcludethisdiscussiononrun-timecomputationofreachingdenitions,only thecaseofsub-sequentialfunctionsisveryclear:itallowsecientonlinecomputation withalgorithmrecursive-programs-online-sa.inallothercases whichincludes allcasesofalgebraictransductions wethinkthatnorealalternativetofunctionsis available.inpractice,recursive-programs-online-sashouldbeappliedtothe largestsubsetofdatastructuresandreadreferencesonwhichissub-sequential,and Recursive-Programs-SAisusedfortherestoftheprogram.Itisperhapsoneof thegreatestfailuresofourframework,sincewecomputedaninterestinginformation reachingdenitions whichweareunabletouseinpractice.thisisalsoadiscouraging argumentforextendingstaticexpansiontorecursiveprograms:whatistheuseofremovingfunctionsifthereachingdenitioninformationfailstogivethevaluewearelooking foratalowercost?finally,functionsmaybesoexpensivetocomputethatconversion tosingle-assignmentformshouldbereconsidered,infavorofotherexpansionschemes.in thiscontext,averyinterestingalternativeisproposedinthenextsection. Eventually,lookingatourmotivatingexamplesinChapter4,orthinkingaboutmost practicalexamplesofrecursiveprogramsusingtreesandotherpointer-baseddatastructures,onecommonobservationcanbemade:thereis\notsomany"memoryreuse if
233 Recursive-Programs-Online-SA(program;) 232program:anintermediaterepresentationoftheprogram CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION 1deneatreetypecalledControlTypewhoseelementsareindexedinLctrl 2build(T;)fromwhereT=(Q;fq0g;F;E)issequentialand:Q!ctrl :asub-sequentialreachingdenitiontransduction 3builda\nextstate"function:Qctrl!QfromT returnsanintermediaterepresentationoftheexpandedprogram 6dodeclareadatastructureDexpoftypeControlType 74builda\nextoutput"function:Qctrl!ctrlfromT 5foreachdatastructureDinprogram deneaglobal\state"variabledqlocal=q0 deneaglobalpointervariabledlocal=&dexp foreachprocedureinprogram deneaglobalpointervariabledlocal=&dexp doinsertanewargumentdlocalintherstplace foreachcalltoaprocedurepinprogram insertanewargumentdlocalinthesecondplace 16 doinsertdlocal->p=newcontroltype()beforethecall insertanewargumentdqlocalinthethirdplace foreachnon-procedureblockbinprogram insertanewargumentdlocal->(dqlocal;p)inthesecondplace insertanewargument(dqlocal;p)inthethirdplace insertanewargumentdlocal->pintherstplace doinsertdlocal->b=newcontroltype()atthetopofb foreachstatementsassigningdinprogram denealocalpointervariabledqlocal=(dqlocal;b) denealocalpointervariabledlocal=dlocal->(dqlocal;b) denealocalpointervariabledlocal=dlocal->b 28returnprogram doref foreachreferencereftodinprogram doleft-handsideofs Dlocal->(DQlocal)->value Dlocal->value notzeromemoryreuse intheseprograms!thislatebutsimplediscoveryisastrong simplybeuseless.infact,manytreeprogramsalreadyhaveahighlevelofparallelism argumentagainstmemoryexpansiontechniquesforrecursivetreeprograms:theymay QueensprograminChapter4. single-assignmenttechniquearelikelytobeveryrarelyusefulinpractice.inthecaseof anddonotneedtobeexpanded.thisisverydisappointingthatthebestresultsofour recursivearrayprograms,expansionisstillacriticalissueforparallelisation,likeforthe WehaveseenthatSA-formconversionisnotpracticalforallrecursiveprograms.It wasalreadythecaseforloopnests,buttheproblemismoreobvioushere.however, PrivatizationofRecursivePrograms programs.becauseoftheheavyuseofproceduresandfunctions,lookingatexpansionasa transformationofglobaldatastructuresintolocalonesismuchmoreprotable.thisidea SA-formisprobablynotthemostsuitablemethodtoextractparallelismfromrecursive
234 5.5.PARALLELIZATIONOFRECURSIVEPROGRAMS happenstobeverysimilartotheprinciplesofarrayprivatizationforloopnests,andwe 233 callingprocedure.inaparallelexecution,thisoftenrequiresadditionalsynchronizations, usethesamewordhere.ageneralprivatizationtechniquecanbedenedforunrestricted andtheoverheadofsuchanexpansionislikelytobeveryhigh.furtherstudyisleftfor recursiveprograms,butcopy-outcodeisnecessarytoupdatethedatastructuresofthe futurework. vpapropertydenedinsection4.3.4(forreachingdenitionanalysispurposes):forall u;v2lctrl,ifvuthenvisanancestorofu,i.e.9w1;w22lctrl;s2ctrl:v=w1s^ u=w1w2(andv<lexu,whichistrivialsincevu).thispropertyisenforcedinmany Wewillrestrictourselvestothecaseofreachingdenitionrelationswhichsatisfythe structure(probablyanarray)tobeexpandedismadelocaltoeachprocedureinthe importantclassesofrecursiveprograms:alldivide-and-conquerexecutionschemes,most program,andtheappropriatecopy-incodeofthewholestructureisinsertedatthe dynamic-programmingimplementations,manysortingalgorithms... beginningpointofeachprocedure.noticenocopy-outisneededsinceitwouldinvolve Now,theprivatizationtechniqueforvpaprogramsisverysimple:everyglobaldata runeachfunctioncallinparallelandinsertsynchronizationsonlywhentheresultofa reachingdenitionsfromnon-ancestorinstances.aprogramprivatizedinthatsenseis functionisneeded. canbefoundatfunctioncallsonly:insteadofwaitingforthefunction'sreturn,onemay generallylessexpandedthansa-form25,andtheparallelismextractedbyprivatization copying,butthesameoptimizationthatworkedforloopnestscanbeappliedhere [TP93,MAL93,Li92]:privatizationcanbedoneonaprocessorbasisinstead,andcopyinisonlyperformedwhenaprocedurecallismadeaccrossprocessors.Weimplemented Thistechniquemayappearsomewhatexpensivebecauseofthedatastructure work[mf98].furtherdiscussionaboutparallelizationofexpandedprogramsisdelayed tosection ofparallelprocedures,the\slow"onebeingcalledonlywhenaprocessor\catches"new thisoptimizationforprogramqueens,usingcilk's\fast"and\slow"implementations TheresultisshowninFigure5.36.TheControlTypestructurehasbeenoptimizedin Weappliedsingle-assignmentalgorithmRecursive-Programs-SAtoprogramQueens. ExpansionofRecursivePrograms:PracticalExamples keepingonlyeldswhichenforcethesingle-assignmentformproperty.itisimplemented withac++template-likesyntaxtohandlebothdexpand-structuredexp: structcontroltype<t>{ };ControlType<T>*b; Tvalue; ControlType<T>*Q; ControlType<T>*a; Queensisnotdeterministic.Thisruinsanyhopetoecientlycomputereachingdenitionsatrun-timeandtoremovethefunction,despitethefactouranalysistechnique 25Asatechnicalremark,thisisnotalwaystruebecausewecopythewholedatastructuresandnot Noticethattheinputautomatonforthereachingdenitiontransducerofprocedure eachelement.insometrickycases,privatizationcanrequiremorememorythansa-form!
235 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION inta[n]; ControlType<int*>*Alocal=&Aexp; ControlType<int*>Aexp=newControlType<int*>(); ControlType<int>Aexp=newControlType<int>(); ControlType<int>*Alocal=&Aexp; A=a IPvoidQueens(ControlType<int>*Alocal,ControlType<int*>*Alocal, if(k<n){ for(inti=0;i<n;i++){ Alocal->b=newControlType<int>(); intn,intk){ B=b ControlType<int>*Alocal=Alocal->a; for(intj=0;j<k;j++){ Alocal->b=newControlType<int*>(); ControlType<int*>*Alocal=Alocal->a; ControlType<int*>*Alocal=Alocal->b; ControlType<int>*Alocal=Alocal->b; Alocal->b=newControlType<int*>(); Alocal->b=newControlType<int>(); Jrs } if(){ =((CurIns;A[j])); Alocal->Q=newControlType<int*>(); Alocal->value=&(A[k]); Alocal->Q=newControlType<int>(); Alocal->value=; Q }}} Queens(Alocal->Q,Alocal->Q,n,k+1); F}intmain(){ } Queens(Alocal,Alocal,n,0); computedanexactresult!thetreetraversalassociatedwiththefunctionhasnotbeen...figure5.36.single-assignmentformconversionofprogramqueens... implementedinfigure5.36,butitdoesnotrequireafulltranversalofdexp:because ControlType,stoppingattherstancestorindependence(i.e.thedeepestancestorin (i.e.&dlocal).thisisimplementedmostecientlywithpointerstotheparentnodein maximumcanbemadeonthepathfromtheroot(i.e.&dexp)tothecurrentelement onlyancestorsarepossiblereachingdenitions(propertyvpa),thecomputationofthe dependence).aneectiveimplementationofstatementrisgiveninfigure5.37.the
236 5.5.PARALLELIZATIONOFRECURSIVEPROGRAMS maxatloc!=nulltestisnecessaryingeneral,when?canbeapossiblereachingdenition,butitcouldindeedberemovedinourcasesinceexecutionofancestorsisguaranteed. TheappropriateconstructionoftheparenteldinControlTypeisassumedintherest 235 r{controltype<int>*maxloc=dlocal; ofthecode.... ControlType<int*>*maxatloc=Dlocal; while(maxatloc!=null&&maxatloc->value!=&(a[j])){ maxloc=maxloc->parent; }}=maxloc->value; maxatloc=maxatloc->parent; programqueens,seefigure5.38.anadditionaloptimizationhasbeenperformed:only...figure5.37.implementationofthereadreferenceinstatementr... beobtainedthankstostaticanalysesofvariables[ch78].parallelizationoftheprivatized thekrstelementsofarrayaarecopied,becausetheothersarenotused.thisresultcan Wealsoexperimentedtheprivatizationtechniquesincepropertyvpaissatisedfor formisstudiedinsection Westartwithtwomotivatingexamplestoshowwhatwewanttoachieve,thendiscussthe resultsofclassicalstaticanalysesonsuchexamples,beforewepresentourstatementwise StatementwiseParallelization parallelizationalgorithm. OurrstexampleistheBSTprogramintroducedinSection2.3.Instancewisedependence MotivatingExample analysishasbeenperformedinsection4.4andtheresultistherationaltransducerin distinctedges,andbecausetheunderlyingdatastructureisatree,weknowthatall secondone.bothconditionalstatementsi1andj1canthusbeexecutedasynchronously Figure4.9.Becausethetworecursivecallsinvolvedereferencesofpointerpalongtwo (recallthatanimplicitsynchronizationissupposedatthereturnpointofprocedurebst, seesection1.2).theparallelversionisgivenbyfigure5.39. accessesperformedaftertherstcallareindependentfromaccessesperformedafterthe parallelexecutionofsandt,andtheirrespectivefunctioncallstoevenandodd. programisthattherearenodependencesbetweeninstancesofsandt.thisallows ontheoddones,seeprogrammapinfigure5.40.theresultofouranalysisforthis Oursecondexamplemapstwofunctionsonalist,oneonevenelementsandtheother resultsonthesetwomotivatingexamples.hendrenetal.proposein[hhn94]adependencetestforrecursiveprogramswithpointer-baseddatastructures.theirtechnique doesnothandlearrays(seenaspointerarithmeticsinthatcase).butsinceithandles Letuscomparetheeectivenessofrelatedparallelizationtechniqueswiththeexpected
237 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION PvoidQueens(intA[n],intn,intk){ inta[n]; intb[n]; B=b A=a I memcpy(b,a,k*sizeof(int)); r if(k<n){ for(inti=0;i<n;i++){ J for(intj=0;j<k;j++){ s if(){ =B[j]; Q B[k]=; }}} Queens(B,n,k+1); F}intmain(){ } Queens(A,n,0); awiderangeofrecursivedatastructures,includingdirectedacyclicgraphsanddoublylinkedlists,itismoregeneralthanourtechniqueinthatdomain.becausetheirpointer...figure5.38.privatizationofprogramqueens... procedureisnot,sincetheirpathexpressionscannotcapturetheevennessofdereference theedgenames,thebstprogramisactuallyparallelizedwiththeirtechnique.butthemap aliasingabstractionisbasedonpathexpressionswhicharepairsofregularexpressionson numbers.theveryprecisealiasanalysisbydeutsch[deu94]wouldallowparallelizationof thetwoexamplesbecausekleenestarsaretherereplacedbynamedcountersconstrained withsystemsofaneequations.moreusualow-sensitiveandcontext-sensitivealias analyses[lrz93,egh94,ste96]wouldgenerallysucceedforbstandfailformap. Algorithm arestatementsinsteadofprogrampoints,andwhoseedgesareprogrampointsinstead graph[asu86]oftheprogram i.e.thedualgraphofthecontrolowgraph whosenodes Wenowpresentanalgorithmforstatementwiseparallelizationofrecursiveprograms, ofstatements.wedeneasynchronizationgraph(ctrl;e0)asasub-graphof(ctrl;e) basedontheresultsofourdependenceanalysis.let(ctrl;e)bethedualcontrolow suchthateveryedgeine0isassociatedwithasynchronizationbarrier.supposingthat chronizationgraphmustensurethatthereareenoughsynchronizationpointstopreserve allsequentialcompositionsofstatementsarereplacedbyasynchronousexecutions,asyn- theoriginalprogramsemantics.thankstobernstein'sconditions,thisisensuredbythe followingcondition:lets;t2ctrlbetwoprogramstatements,st2e,andbbethe
238 5.5.PARALLELIZATIONOFRECURSIVEPROGRAMS LI2 PvoidBST(tree*p){ ai1 spawnif(p->l!=null){ if(p->value<p->l->value){ BST(p->l); cb t=p->value; }} p->l->value=t; p->value=p->l->value; RJ2 dj1 spawnif(p->r!=null){ if(p->value>p->r->value){ BST(p->r); fe t=p->value; }} p->r->value=t; p->value=p->r->value; F}intmain(){ } if(root!=null)bst(root);......figure5.39.parallelizationofprogrambst... st voidmap(list*p,list*q){ p->value=even(p->value); }intmain(){ if(){ q->value=odd(q->value); Map(p->next->next,q->next->next); } Map(list,list->next); innermostblocksurroundingbothsandt,...figure5.40.secondmotivatingexample:programmap... Indeed,executinguBxSanduByTinparallelinducesparallelexecutionofalltheir ST2E0()9v;w2Lctrl;u;x0;y02ctrl;x;y2(ctrlnfBg): descendants coarsegrainparallelization andprexushouldbechosenaslongaspos- v=ubxsx0^w=ubyty0^vw_wv:(5.44)
239 sible,hencetherestrictionofxandytonon-blabels.algorithmstatementwise- 238 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION Parallelizationisbasedonthisequationtogenerateaparallelprogramwiththe requiredsynchronizations.itisinterestingtonoticethat lationcanbeusedinsteadofthedependenceonetodescribestatementsthatmay whichmeansthatintersectionwiththelexicographicorderisnotnecessary:conictre- vw_wv()vw^(v2w_w2w); executeinparallel.because(ctrlb(ctrlnfbg)sctrlctrlb(ctrlnfbg)tctrl) cases:theconictrelationisapproximateonlyformulti-dimensionalarrays.noticethat nizationgraphforarecursiveprogramcanbedonewithoutanyapproximationinmost dependcanbecomputedexactly.thesetworemarksshowthatcomputingthesynchro- instatementwise-parallelizationisarecognizablelanguage,itsintersectionwith issueisleftforfuturework. thisalgorithmdoesnotperformanystatementreorderinginsideaprogramblock;this Statementwise-Parallelization(program;) 1depend program:anintermediaterepresentationoftheprogram 2(ctrl;edges) :theconictrelationtobesatisedbyallparallelexecutionorders returnsaparallelimplementationofprogram \((WR)[(RW)[(WW)) 54doB 3foreachSTinedges innermostblocksurroundingbothsandt depend\(ctrlb(ctrlnfbg)sctrl dualcontrolowgraphofprogram 9867 insertaspawnkeywordbeforeeverystatement ifsynchro6=? theninsertasyncstatementatprogrampointassociatedwithst ctrlb(ctrlnfbg)tctrl) 10returnprogram parallelism.whenaspawnedstatementisimmediatelyfollowedbyasync,bothkeywords chronousexecutionatthestatementlevel,andseveralenvironmentsdonotsupportnested lelprogrammingenvironment:cilkonlyallowsasynchronousprocedurecalls,notasyn- Ofcourse,severalspawnkeywordsmaybeuselessormisplacedregardingtheparal- Parallelizationonthetwomotivatingexamplesyieldstheexpectedresults. andshrinkthecritical-path,seeforexample[rin97].applicationofstatementwise- canberemovedsincesuchaconstructisequivalenttosequentialexecution.inaddition, powerfulmethodshavebeencraftedtooptimizethenumberofsynchronizationpoints pendencetest(butautomaticcomputationofstoragemappingsisnothandledin[fea98]). asimilarresultonbothmotivatingexamples,sincetheyarebasedonaninstancewisede- Eventually,theparallelizationtechniqueproposedbyFeautrierin[Fea98]wouldnd StatementwiseParallelizationviaMemoryExpansion OurrunningexampleisnowprogramQueens,alreadystudiedinthepreviouschapters. Figure4.15,andtheprivatizedQueensprogramproposedinSection5.5.5,seeFigure5.38. reachingdenitioninformationcomputedinsection4.5,i.e.theone-countertransducerin Thisprogramdoesnotholdanyparallelloop(theinner-looplooksparallelbutmemory dependencesonthe\"partsactuallyhampersparallelization).wewillconsiderthe
240 5.5.PARALLELIZATIONOFRECURSIVEPROGRAMS RecallthatreachingdenitionrelationofprogramQueenssatisedthevpaproperty: 239 isthattherecursivecallcanbemadeasynchronous,seefigure5.41.startingfromthe single-assignmentformversionofprogramqueens(seefigure5.36),nomoreparallelism todecidewhetheraprocedurecallcanbeexecutedasynchronouslyornot.theresult thisguaranteesthatthereachingdenitionrelationcanbeusedasdependenceinformation wouldhavebeenextractedbuttheoverheadduetofunctioncomputationwouldmake theparallelprogramunpractical.... PvoidQueens(intA[n],intn,intk){ inta[n]; intb[n]; B=b IA=amemcpy(B,A,k*sizeof(int)); if(k<n){ for(inti=0;i<n;i++){ Jr for(intj=0;j<k;j++){ s if(){ =B[j]; Q B[k]=; }}} spawnqueens(b,n,k+1); F}intmain(){ } Queens(A,n,0);...Figure5.41.ParallelizationofprogramQueensviaprivatization... Parallelizationusingthereaching-denitionrelationasdependencerelationfortheexpandedprogram.However,ifprivatizationischosen,onlyasynchronouscallstoprivatized Thealgorithmtoachievethisresultautomaticallyissimple.Firstchoosebetween single-assignmentformandprivatization;second,applyalgorithmstatementwisebecausesomememory-baseddependencesbetweeninstancesofnon-procedurestatements mayremain. asynchronousandparallelconstructsshouldberemovedfromthegeneratedcode;thisis proceduresareprovablycorrect(theypreservetheoriginalprogramsemantics),allother andtothespeed-upoftheparallelversioncomparedtothesequentialnon-privatized processorsgiorigin2000.theresultsinfigure5.42correspondstotheexecutiontime one(withoutcilkoverheadandwithoutarraycopying).theprogramwasrunwith SomeexperimentshavebeenperformedwiththeCilkenvironment[MF98]ona32 overheadinducedbytheexpansionofprogramqueens.performanceisverygoodupto 16processors,thenitdegradesfor32processors. 13queensonly,todemonstratesboththeeciencyoftheCilkrun-timeandthelow
241 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION Figure5.42.Parallelresolutionofthen-Queensproblem (seenasaparticularimplementationofarrays)usedbyruginaandrinardin[rr99] withotherparallelizationtechniques.ithappensthatanalysesforpointerarithmetics NoticethattheprivatizedQueensprogramcanitselfbethematterofacomparison Processors Processors expressionanalysiscomputesax-pointoverrecursivecallstoprocedurequeenswhich executedasynchronouslywiththenextiterations.however,theinter-proceduralregion k,whichmeansthatforagiveniterationoftheouterloop,theprocedurecallcanbe areunabletoparallelizetheprogram.indeed,theorderinganalysisshowsthatj< practice. cannotcapturethefactthatonlythekrstelementsofarrayaareuseful:subsequent recursivecallsarethussupposedtoreadthewholearraya,whichisnotthecasein stancelevel.thiscommontechniqueforloopnestparallelizationiscompletelynewfor Thislastsectioninvestigatesparallelizationofrecursiveprogramsatthestatementin InstancewiseParallelization recursiveprograms.noticewedonotproposearun-timeparallelizationtechniquefor recursiveprograms:wedescribeatcompile-timethesetsofrun-timeinstanceswhichcan beexecutedasynchronously. identicalintherstcall:theyaresettotherootofabinarytreestructure. MotivatingExample instancewiseornot wouldreturnthesameresult:noparallelismcanbefoundinthis WestudytheprocedurePexampleinFigure5.43.a.Pointerargumentspandqare program.however,amorepreciseobservationshowsthatwhenthecurrentcontrolword wcontainsbothaandborbothcandd,pandqmayneverbealiasedagaininall Becausepandqmaybealiasedduringthewholeexecution,anydependencetest abstractparallelizationofprocedurepinfigure5.43.b(recallthatcurinsstandsforthe descendantsofw(wordssuchaswisastrictprex).thisprovesthecorrectnessofthe ahugeamountofparallelism anaveragelogarithmicparallelcomplexity. havebeentaken,allrecursivecallscanbeexecutedasynchronously.thisyieldsinpractice run-timevalueofthecontrolword).assoonasbothbranchesofthesameconditional techniqueforrecursiveprograms.ofcourse,suchatechniquerequiresmoreinformation Eventually,thismotivatingexampleshowstheneedforaninstancewiseparallelization Time (s) Sequential 13-Queens Speed-up (parallel / original) Optimal 13-Queens
242 5.5.PARALLELIZATIONOFRECURSIVEPROGRAMS atpvoidp(int*p,int*q){ bs if()p(p->l,q); q->v=; elsep(p,q->r); p->v=; cd if()p(p->r,q); atspvoidp(int*p,int*q){ }intmain(){ elsep(p,q->l); b q->v=; p->v=; c if()spawnp(p->l,q); d elsespawnp(p,q->r); if(curins2(a+d)+(b+c))sync if()spawnp(p->r,q); elsespawnp(p,q->l); Figure5.43.a.ProcedureP F} P(tree,tree); F}intmain(){ Figure5.43.b.AbstractparallelizationofP } P(tree,tree); thanasimpledependencetest:aprecisedescriptionoftheinstancesindependenceisthe...figure5.43.instancewiseparallelizationexample... Algorithm keyforinstancewiseparallelismdetection. programs,andtogeneratetheparallelcode.thistechniquenaturallyextendstheprevious Wenowpresentanalgorithmtoautomaticallydetectinstancewiseparallelisminrecursive words.theideaconsistsinguardingeverysyncstatementwiththedomainofrelation synchroinstatementwise-parallelization.inthecaseofalgebraicrelations,this ofthecurrentrun-timeinstancetorationalsubsetsoflctrl thewholelanguageofcontrol statementwisealgorithm,butsynchronizationstatementsarenowguardedbymembership computearationalapproximationofthedomainbeforegeneratingthecode. domainisanalgebraiclanguageandmembershipmaynotbedecidedeciently,wethen foronlinecomputationofthecurins2setcondition.thisfunctionisusuallyimplementedwithatwo-dimensionalarray,seetheexamplebelow.26 InstancewiseparallelizationalgorithmInstancewise-Parallelizationisbasedon thestatementwiseversion,anditgeneratesa\nextstate"functionalpha:qctrl!q usedforonlinerecognitionof(a+d)+(b+c)isgiveninfigure5.44.b.transitionsare butthesynchronizationconditionisnowfullyimplemented:thedeterministicautomata Figure5.44.ItisbasicallythesameparallelizationastheabstractcodeinFigure5.43.a, TheresultofInstancewise-ParallelizationappliedtoprocedurePisshownin statementlabels. storedinarraynext,therstdimensionisindexedbystatenumbersandthesecondby compile-timewhichinstancesofprocedurepallowasynchronousexecutionoftherecursive onthisexample,becauseitisadependencetestonly:itcannotbeusedtocomputeat 26Anextensiontodeterministicalgebraiclanguageswouldberathereasytodesign,andwouldsometimesgivebetterresultsforrecursiveprogramswitharrays.Nevertheless,itrequirescomputationof NoticetheparallelizationtechniqueproposedbyFeautrierin[Fea98]wouldalsofail approximation. adeterministicapproximationofanalgebraiclanguage,whichismuchmoredicultthanarational
243 Instancewise-Parallelization(program;) 242program:anintermediaterepresentationoftheprogram CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION 1depend 2(ctrl;edges) :theconictrelationtobesatisedbyallparallelexecutionorders returnsaparallelimplementationofprogram \((WR)[(RW)[(WW)) 54doB 3foreachSTinedges synhro innermostblocksurroundingbothsandt depend\(ctrlb(ctrl fbg)sctrl dualcontrolowgraphofprogram 6897 ifset6=? thenifsetisalgebraic domainofrelationsynchro ctrlb(ctrl fbg)tctrl) thenset rationalapproximationofset foreachprocedureinprogram deneaglobalvariablestate=q0 computea\nextstate"functionfrom(q;fq0g;f;e) determinizationofset foreachcalltoaprocedurepinprogram doinsertanewargument(state;p)intherstplace foreachnon-procedureblockbinprogram doinsertanewargumentstateintherstplace 21 22returnprogram insertaspawnkeywordbeforeeverystatement insert\if(state2f)sync"atprogrampointassociatedwithst dodenealocalvariablestate=(state;b) calls. pansion.expandingdatastructuresisaclassicaloptimizationtocutmemory-based 5.6 Inthischapter,westudiedautomaticparallelizationtechniquesbasedonmemoryex- Conclusion location.thesecondproblemisthatconvertingprogramstosingle-assignmentformis location,inthegeneratedcode.whencontrolanddataowcannotbeknownatcompiletime,run-timecomputationshavetobedonetondtheidentityofthecorrectmemory dependences.therstproblemistoensurethatallreadsrefertothecorrectmemory denitioninformation,arobustrun-timedata-owrestorationscheme,andaversatile toocostly,intermsofmemoryusage. lems.weproposedageneralmethodforstaticexpansionbasedoninstancewisereaching storagemappingoptimizationtechnique.ourtechniquesareeithernovelorgeneralize Whendealingwithunrestrictednestsofloopsandarrays,wehavetackledbothprob- previousworktounrestrictednestsofloops.eventually,allthesetechniqueswerecombinedinasimultaneousexpansionandparallelizationframework,basedonexpansion constraints.manyalgorithmsweredesigned,fromsingle-assignmentconversiontoconstrainedstoragemappingoptimizationandecientdata-owrestoration.thiswork advocatesfortheuseofconstrainedexpansioninparallelizingcompilers.thegoalisnow todesignpragmaticconstraintsandtoproposearealbi-criteriaoptimizationalgorithm
244 5.6.CONCLUSION... intstate=0; 243 PvoidP(intstate,int*p,int*q){ sintnext[4,4]={{1,2,2,1},{1,3,3,1},{2,3,3,2},{3,3,3,3}}; at if() q->v=; p->v=; b else if(state==3)sync spawnp(next[state,1],p,q->r); spawnp(next[state,0],p->l,q); a;d dc else if() }intmain(){ spawnp(next[state,3],p,q->l); spawnp(next[state,2],p->r,q); 0 a;d1 b;c b;ca;b;c;d Figure5.44.a.Parallelcode F} P(state,tree,tree); 2 a;d 3 synchronizationatrun-time Figure5.44.b.Automatontodecide b;c forexpansionoverheadandparallelismextraction....figure5.44.automaticinstancewiseparallelizationofprocedurep... basedontherationalandalgebraictransductionresultsofouranalysisforrecursiveprograms.dicultproblemsrelatedwithonlinecomputationofreachingdenitionsand run-timedata-owrestorationwhereinvestigated.extendingconstrainedexpansionand unresolvedissuesforsimplerexpansionschemesmustbeinvestigatedrst.eventually, storagemappingoptimizationtorecursiveprogramsisleftforfuturework,butseveral couldbeusedtoextractcontrolparallelism.asimplealgorithmtodecidewhethertwo incombinationwiththeprivatizationtechnique.thisalgorithmachievesbetterresults statementscanbeexecutedinparallelhasbeendesignedandappliedtoanexample weshowedthattherationaloralgebraictransductionsreturnedbydependenceanalysis investigatedmemoryexpansionofrecursiveprograms,whichisanewissueinautomatic parallelization.single-assignmentandprivatizationwereextendedtorecursiveprograms, Thesecondpartofthischapterdiscussedparallelizationofrecursiveprograms.We thanmostexistingtechniques,becauseitisbasedonaveryprecise andinstancewise dependenceinformation.thesegoodresultsmotivatefurtherresearchesindependence pletelynewforrecursiveprograms.however,algorithmsproposedarestillratherprim- itive:theyneitherperformstatementreorderingnorintegratearchitectureparameters studiedinmoreclassicalparallelizationframeworksandwehopethatthesamesolutions parallelization:itdecidesatcompile-timewhethertwoinstancesofastatementcanbe executedinparallelornot.commoninthecaseofnestedloops,thistechniqueiscom- analysisofrecursiveprograms.anothercontributionisthealgorithmforinstancewise suchastheminimalgrainofparalleltasks.fortunately,theseissueshavebeenwidely
245 wouldapplytoourownframework. 244 CHAPTER5.PARALLELIZATIONVIAMEMORYEXPANSION study boththeoreticallyandexperimentally theeectoffunctionsonparallelcode performance.second,studyhowcomprehensiveparallelizationtechniquescanbeplugged intotheconstrainedstoragemappingoptimizationframework:reducingmemoryusageis Futureworkisthreefold.First,improveoptimizationofthegeneratedcodeand anextensivestudyoftheapplicabilityofmemoryexpansiontechniquesforparallelization ofrecursiveprograms. agoodthing,butchoosingtherightparallelexecutionorderisanother.third,proceedin
246 245 Chapter6 Conclusion byadiscussionofperspectivesandfutureworks. Wenowconcludethisthesisbyasummaryofthemainresultsandcontributions,followed addressautomaticparallelizationandaresummarizedinthenexttable,andthefourth 6.1 Ourmaincontributionscanbedividedintofourcloselyrelatedparts.Therstthreeparts Contributions oneisaboutrationalandalgebraictransductions.notallcontributionsinthistableare toalargerclassofprograms. wellmaturedandreadytouseresults:mostoftheworkaboutrecursiveprogramsshould beseenasarstattempttoextendinstancewiseanalysisandtransformationtechniques Instancewise Affineloopnests [Bra88,Ban88] witharrays Unrestrictedloopnests [BCF97,Bar98] witharrays witharraysandtrees [Fea98],1Chapter4, Recursiveprograms Instancewisereaching[Fea88a,Fea91,Pug92] dependenceanalysis[fea88a,fea91,pug92] definitionanalysis [CBF95,BCF97,Bar98] [WP95,Won95] publishedin[cc98]2 Single-assignment [Fea88a,Fea91] [MAL93] [WP95,Won95] [Col98], publishedin[cc98]2 Section5.5 Chapter4, Maximalstatic form Sections5.2and5.4, Sections5.1and5.4 Storagemapping expansion [LF98,Lef98] publishedin[bcc98,coh99b,bcc00] Sections5.3and5.4, openproblem Instancewise optimization [SCFS98,CDRV97] [Fea92,CFH95] publishedin[cl99,coh99b] [GC95,CBF95] openproblem Letusnowrevieweverycontributioninmoredetail. parallelization [DV97] [Col95b] Section5.5 1Dependencetestfortreesonly. 2Forarraysonly.
247 ControlandDataStructures:BeyondthePolyhedralModelInChapter2, 246 CHAPTER6.CONCLUSION wedenedaprogrammodelandmathematicalabstractionsforstatementinstancesand formalpresentationofourtechniques,especiallywhendealingwithrecursivecontroland elementsofdatastructures.thisframeworkwasusedthroughoutthisworktogivea nalandalgebraictransductions.usinganewdenitionofinductionvariablesinrecursive programs,wecouldcapturetheeectofeveryrun-timeinstanceofastatementinarationaloralgebraictransduction.becauseconditionalsandloopboundsareunrestricted,we couldachieveonlyapproximateresultsingeneral.asummaryofprogrammodelrestrictionsandacomparisonwithotherdependenceandreachingdenitionanalysesconcludes wereproposedinchapter4,basedonformallanguagetheory,andmorepreciselyonratio- Novelinstancewisedependenceandreachingdenitionanalysesforrecursiveprograms benetofthewealthofalgorithmstoworkwithanerelationsinpresburgerarithmetics. theprogrammodel westickedtotheclassicaliterationvectorframework,andwetook thiswork. However,whendesigningalgorithmsfornestedloopsandarrays aspecialcaseof viamemoryexpansionisanoldtechnique,buttherecentextensionofinstancewisereachingdenitionanalysestoprogramswithconditionals,complexdatastructurereferences thesecondisthatexistingtechniquesformemoryexpansionhavetobeextendedtot thenewprogrammodels. e.g.non-anearraysubscripts orrecursivecallsraisesnewquestions.therstoneisto MemoryExpansion:NewTechniquestoSolveNewProblemsParallelization ensurethatreadaccessesintheexpandedprogramrefertothecorrectmemorylocation; hasbeenextendedtounrestrictedloopnests.combinationofthetwotechniqueshasalso unrestrictednestedloopsandarrays.anewtechniquetoreducetherun-timeoverhead ofmemoryexpansionhasbeenproposed,andanothertechniquetoreducememoryusage WeaddressedbothquestionsintherstfoursectionsofChapter5,whendealingwith oftheowofdata(whenitismandatory).wealsodiscussedexperimentalresultsona beenstudied.eventually,wedesignedseveralalgorithmstooptimizerun-timerestoration shared-memoryarchitecture. caseswecoulddesignalgorithmstogeneratelow-overheadexpandedrecursiveprograms. eredthatthemathematicalabstractionforreachingdenitions rationalandalgebraic transductions mayincurasevererun-timeoverhead.nevertheless,inafewparticular Memoryexpansionforrecursiveprogramsisacompletelynewtopic,andwediscov- Parallelism:ExtendingClassicalTechniquesOurnewdependenceanalysistechniquehasbeenshownusefultoparallelizingrecursiveprograms.Itdemonstratesthe applicabilityofrationalandalgebraictransductions,thankstotheirdecidableproperties. ofrecursiveprograms:thisnewtechniqueismadepossiblebytheinstancewiseinformationcapturedinrationalandalgebraictransductions.afewexperimentalresultswere achievebetterresultsingeneral.anotheralgorithmaddressesinstancewiseparallelization discussed,combiningexpansionandparallelizationonawellknownrecursiveprogram. resultsofthisworkdonotbelongtocompilation.theyaremostlyfoundinthethird Therstalgorithmwepresentedissimilartoexistingparallelizationmethodsforrecursive programs,butittakesbenetoftheadditionalinformationcapturedbyouranalysisto FormalLanguageTheory:SeveralContributionsandApplicationsThelast
248 6.2.PERSPECTIVES sectionofchapter3 presentingusefulmathematicalabstractions andsomeinthefollowingsections.wedesignedasub-classofrationaltransductionswithbooleanalgebra structureandmanyotherinterestingproperties.weshowedthatthisclassisnotde- 247 monoidsandinvestigatedapproximationofalgebraictransductions. presentedsomenewresultsaboutcompositionofrationaltransductionsovernon-free cidableamongrationaltransductions,butconservativeapproximationtechniquesallow totakebenetofthesepropertiesinthewholeclassofrationaltransductions.wealso 6.2 Manyquestionsarosealongthisthesis,andourresultsmotivatemoreinterestingstudies thanitsolvesproblems.westartwithquestionsrelatedwithrecursiveprograms,then Perspectives discussfutureworkinthepolyhedralmodel. applications.reachingdenitionanalysishasmostsueredoftheselimitations,aswell havebeensuccessfulinmanycases,butthelackofexpressivenesshasoftenlimitedtheir propertiesappearedoncemoreasacriticalissue.rationalandalgebraictransductions Firstofall,lookingforthegoodmathematicalabstractiontocaptureinstancewise asintegrationofconditionalexpressionsandloopboundsindependenceanalysis.inthis context,wewouldliketoconsidermorethanonecounterinatransducer,andstillbeable todecideemptinessandotherusefulproperties.wearethusveryinterestedinthework bycomonandjurski[cj98]ondecidingtheemptinessforasub-classofmulti-counter [CBF95]:insertingnewparameterstocapturepropertiesofnon-aneexpressionsand classesofminskymachines,suchastimedautomata.inaddition,usingseveralcounters wouldallowustoextendoneofthemajorideasunderlyingfuzzyarraydataowanalysis languages,andmoregenerallyinstudiesaboutsystemvericationbasedonrestricted Inparticular,wediscoveredwhenstudyingdeterministicandleft-synchronousrelations importantthingforprogramanalysis:afewgoodapproximateresultsareoftensucient. improveprecision. thatanicesub-classwithgooddecidabilitypropertiescannotbeusedinourframework Moreover,webelievethatdecidabilityofthemathematicalabstractionisnotthemost withoutanecientapproximationmethod.improvingourtechniquestoresynchronize rationaltransducersandapproximatethembyleft-synchronousonesisthusanimportant issue.wealsohopethatthisdemonstratesthehighmutualinterestofcooperations foragracefuldegradationofourresultsusingapproximationtechniques.thisideahas aspossibleintheprogrammodel.ashintedbefore,thebestwayconsistsinlooking betweentheoreticalcomputerscientistsandcompilationresearchers. beeninvestigatedinasimilarcontext[cbf95],andstudyingitsapplicabilitytorecursive Besidestheseformalaspects,anotherresearchissueistoalleviateasmanyrestrictions variablecomputationonexecutiontraces(insteadofcontrolwords) allowinginduction variableupdateineveryprogramstatement thentodeduceapproximateinformation programsisaninterestingfuturework.anotherideawouldbetoperforminduction oncontrolwords;relyingonabstractinterpretationtechniques[cc77]wouldperhapsbe thehighoverheadtocomputereachingdenitionsatrun-time eitherexactlyorwith helpfulinprovingthecorrectnessofourapproximations. localtoeachprocedure seemmorepromising,butrequirefurtherstudy.workingon functions.pragmatictechniquessimilartoprivatization i.e.makingaglobalvariable Theinterestofmemoryexpansionforrecursiveprogramsisstillunclear,becauseof anextensionofmaximalstaticexpansionandstoragemappingoptimizationtorecursive
249 programsisperhapstooearlyinthiscontext,buttransitiveclosure,classenumeration 248 CHAPTER6.CONCLUSION openproblems. andgraphcoloringtechniquesforrationalandalgebraictransductionsareinteresting arationaltransducerfromdatestoinstancesisperhapsagoodidea,buttheproblemof generatingthecodetoenumeratetheprecisesetsofinstancesbecomesratherdicult. waytoassignsetsofrun-timeinstancestologicalexecutiondatesisunknown.building Wehavenotaddressedtheproblemofschedulingrecursiveprograms,becausethe exploitedbycontrolparalleltechniques,andtheneedforadataparallelexecutionmodel Besidesthesetechnicalreasons,mostparallelisminrecursiveprogramscanalreadybeen isnotobvious. fromthepolyhedralmodelcoveranimportantpartofthisthesis.anmajorgoalthroughouthisworkwastokeepsomedistancewiththemathematicalrepresentationofane relations.onedrawbackofthispointofviewistheincreaseddicultytobuildoptimized Inadditiontomotivatingalargepartofourworkonrecursiveprograms,techniques algorithmsreadytobeusedinacompiler,butthebigadvantageisthegeneralityofthe approach.amongthetechnicalproblemsthatshouldbeimprovedinbothmaximalstatic expansionandstoragemappingoptimization,themostimportantarethefollowing. practicalexperiencewithparallelizationofloopnestswithunpredictablecontrolowand mainlyusedasanintermediaterepresentation,functionsarerarelyimplementedin non-anearraysubscriptsisstillverylow.becausethessaframework[cfr+91]is Manyalgorithmsforrun-timerestorationofthedataowhavebeendesigned,but practice.generatinganecientdata-owrestorationcodeisthusarathernewproblem. mustbedone.themainideaswouldbecodepartitioning[ber93]andextendingourtechniquestohierarchicaldependencegraphs,arrayregions[cre96]orhierarchicalschedules alargescaleexperimenthasneverbeenperformed.toapplypreciseanalysisandtransformationtechniquestorealprograms,animportantworkinoptimizingthetechniques Noparallelizingcompilerforunrestrictednestedloopshasbeendesigned.Asaresult, copy-out,schedulelatency,memoryhierarchy,memoryusage,placementofcomputations [CW99]. andcommunications...andwehaveseenthattheoptimizationproblemisevenmore rameters:run-timeoverhead,parallelismextraction,parallelizationgrain,copy-inand Aparallelizingcompilermustbeabletotuneautomaticallyalargenumberofpa- arststep. neousoptimizationofsomeparametersrelatedwithmemoryexpansion,butthisisonly complexfornon-aneloopnests.ourconstrainedexpansionframeworkallowssimulta-
250 249 Bibliography [AB88] [AFL95] J.-M.AutebertandL.Boasson.Transductionsrationnelles.Masson,Paris, France, ,LaJolla,California,USA,June1995. ProgrammingLanguageDesignandImplementation(PLDI'95),pages174{ A.Aiken,M.Fahndrich,andR.Levien.Betterstaticmemorymanagement: Improvingregion-basedanalysisofhigher-orderlanguages.InACMSymp.on [AI91] 39{50,June1991. Symp.onPrinciplesandPracticeofParallelProgramming(PPoPP'91),pages C.AncourtandF.Irigoin.ScanningpolyhedrawithDOloop.In3rdACM [AK87] J.AllenandK.Kennedy.AutomatictranslationofFortranprogramstovector [Ala94] form.acmtrans.onprogramminglanguagesandsystems,9(4):491{542, October turesdedonneesirregulieres.phdthesis,universitebordeauxi,september M.Alabau.Uneexpressiondesalgorithmesmassivementparallelesastruc- [AR94] [Amm92]Z.Ammarguellat.Acontrol-ownormalizationalgorithmanditscomplexity. R.AndonovandS.Rajopadhye.Asparseknapsackalgo-tech-cuitandits synthesis.inint.conf.onapplication-specicarrayprocessors(asap'94), IEEETrans.onSoftwareEngineering,18(3):237{251,March1992. [ASU86] SocietyPress. pages302{313,san-francisco,california,usa,august1994.ieeecomputer [Bak77] Tools.Addison-Wesley,1986. A.Aho,R.Sethi,andJ.Ullman.Compilers:Principles,Techniquesand [Ban88] B.S.Baker.Analgorithmforstructuringprograms.JournaloftheACM, 24:98{120,1977. [Ban92] U.Banerjee.LoopTransformationsforRestructuringCompilers:TheFoundations.KluwerAcademicPublishers,Boston,USA,1992. Publishers,Boston,USA,1988. U.Banerjee.DependenceAnalysisforSupercomputing.KluwerAcademic [Bar98] PhDthesis,UniversitedeVersailles,France,February1998. D.Barthou.ArrayDataowAnalysisinPresenceofNon-aneConstraints.
251 [BBA98] 250 H.Bourzou,B.SidiBoulenouar,andR.Andonov.Atilingapproachfor BIBLIOGRAPHY [BC99a] solvingdynamicprogrammingknapsackproblemrecurrences.inrencontres IGM99-06,InstitutGaspardMonge,UniversitedeMarne-la-Vallee,France, M.P.BealandO.Carton.Asynchronousslidingblockmaps.TechnicalReport francophonesduparallelisme(renpar'10),strasbourg,france,june1998. [BC99b] M.-P.BealandO.Carton.Determinizationoftransducersoverniteandinnitewords.TechnicalReport(toappear),InstitutGaspardMonge,Universite [BCC98] D.Barthou,A.Cohen,andJ.-F.Collard.Maximalstaticexpansion.In demarne-la-vallee,france,1999. [BCC00] 25thACMSymp.onPrinciplesofProgrammingLanguages,pages98{106,San D.Barthou,A.Cohen,andJ.-F.Collard.Maximalstaticexpansion.Int. Diego,California,USA,January1998. [BCF97] D.Barthou,J.-F.Collard,andP.Feautrier.Fuzzyarraydataowanalysis. JournalofParallelandDistributedComputing,40:210{226,1997. JournalofParallelProgramming,June2000.Toappear. [BDRR94]P.Boulet,A.Darte,T.Risset,andY.Robert.(Pen)-ultimatetiling?In [BE95] W.BlumeandR.Eigenmann.Symbolicrangepropagation.InProc.ofthe nessee,usa,may1994.ieeecomputersocietypress. ScalableHigh-PerformanceComputingConf.,pages568{576,Knoxville,Ten- [BEF+96]W.Blume,R.Eigenmann,K.Faigin,J.Grout,J.Hoeinger,D.Padua,P.Petersen,W.Pottenger,L.Rauchwerger,P.Tu,andS.Weatherford.Parallel California,USA,April1995.IEEEComputerSocietyPress. 9thInt.ParallelProcessingSymp.(IPPS'95),pages357{363,SantaBarbara, [Ber79] J.Berstel.TransductionsandContext-FreeLanguages.Teubner,Stuttgart, Germany,1979. programmingwithpolaris.ieeecomputer,29(12):78{82,december1996. [Ber93] J.-Y.Berthou.Contructiond'unparalleliseurdelogicielsscientiquesde [BH77] grandetailleguideepardesmesuresdeperformances.phdthesis,universitepierreetmariecurie(parisvi),france,october1993. M.BlattnerandT.Head.Singlevalueda-transducers.JournalofComput. [Bra88] tion.inacmint.conf.onsupercomputing,pages407{417,st.malo,france, T.Brandes.Theimportanceofdirectdependencesforautomaticparalleliza- andsystemsci.,15:310{327,1977. [CBC93] J.-D.Choi,M.Burke,andP.Carlini.Ecientow-sensitiveinterprocedural July1988. PrinciplesofProgrammingLanguages(PoPL'93),pages232{245,Charleston, SouthCarolina,USA,January1993. computationofpointer-inducedaliasesandsideeects.in20thacmsymp.on
252 BIBLIOGRAPHY [CBF95] J.-F.Collard,D.Barthou,andP.Feautrier.Fuzzyarraydataowanalysis. 251 [CC77] 92{102,SantaBarbara,California,USA,July1995. P.CousotandR.Cousot.Abstractinterpretation:auniedlatticemodelfor InACMSymp.onPrinciplesandPracticeofParallelProgramming,pages staticanalysisofprogramsbyconstructionofapproximationofxpoints.in [CC98] 4thACMSymp.onPrinciplesofProgrammingLanguages,pages238{252,Los Angeles,California,USA,January1977. A.CohenandJ.-F.Collard.Instance-wisereachingdenitionanalysisfor recursiveprogramsusingcontext-freetransductions.inparallelarchitectures [CCG96] andcompilationtechniques,pages332{340,paris,france,october1998. A.Cohen,J.-F.Collard,andM.Griebl.Data-owanalysisofrecursivestructures.InProc.ofthe6thWorkshoponCompilersforParallelComputers, pages181{192,aachen,germany,december1996. IEEEComputerSocietyPress.(IEEEawardforthebeststudentpaper). [CDRV97]P.-Y.Calland,A.Darte,Y.Robert,andFredericVivien.Pluggingantiand [CFH95] outputdependenceremovaltechniquesintoloopparallelizationalgorithms. ParallelComputing,23(1{2):251{266,1997. ScienticComputing,February1995. lelismviahierarchicaltiling.insiamconferenceonparallelprocessingfor L.Carter,J.Ferrante,andS.FlynnHummel.Ecientmultiprocessorparal- [CFR+91]R.Cytron,J.Ferrante,B.K.Rosen,M.N.Wegman,andF.K.Zadeck.Ef- cientlycomputingstaticsingleassignmentformandthecontroldependence [CFR95] October1991. graph.acmtrans.onprogramminglanguagesandsystems,13(4):451{490, [CH78] P.CousotandN.Halbwachs.Automaticdiscoveryoflinearrestraintsamong systemsofaneconstraints.parallelprocessingletters,5(3),1995. J.-F.Collard,P.Feautrier,andT.Risset.ConstructionofDOloopsfrom [Cho77] Languages,pages84{96,January1978. variablesofaprogram.in5thacmsymp.onprinciplesofprogramming C.Chorut.Unecaracterisationdesfonctionssequentiellesetdesfonctions [CI96] Science,5:325{338,1977. sous-sequentiellesentantquerelationsrationnelles.theoreticalcomputer [CJ98] B.CreusilletandF.Irigoin.Interproceduralarrayregionanalyses.Int.JournalofParallelProgramming,24(6):513{546,December1996. AidedVerication,volume1427ofLNCS,pages268{279,Vancouver,Britich presburgerarithmetic.ina.huandm.vardi,editors,proc.computer Columbia,Canada,1998.Springer-Verlag. H.ComonandY.Jurski.Multiplecountersautomata,safetyanalysisand [CK98] France,1998. yses.technicalreport1998/22,laboratoireprism,universitedeversailles, J.-F.CollardandJ.Knoop.Acomparativestudyofreachingdenitionsanal-
253 [CL99] 252 A.CohenandV.Lefebvre.Optimizationofstoragemappingsforparallel BIBLIOGRAPHY [Cla96] P.Clauss.Countingsolutionstolinearandnonlinearconstraintsthrough France,September1999.Springer-Verlag. programs.ineuropar'99,number1685inlncs,pages375{382,toulouse, Ehrhartpolynomials:Applicationstoanalyzeandtransformscienticprograms.InACMInt.Conf.onSupercomputing,pages278{295.ACMPress, [Coh97] A.Cohen.Analysedeotdedonneesdeprogrammesrecursifsal'aidede studentpaper). Par'9),Lausanne,Suisse,May1997.(IEEEawardforthebestfrench-speaking grammaireshors-contexte.inrencontresfrancophonesduparallelisme(ren- [Coh99a] delangagesalgebriques.techniqueetscienceinformatiques,18(3):323{343, A.Cohen.Analysedeotdedonneespourprogrammesrecursifsal'aide [Coh99b]A.Cohen.Parallelizationviaconstrainedstoragemappingoptimization.In [Col94a] LNCS,pages83{94,Kyoto,Japan,May1999.Springer-Verlag. J.-F.Collard.Codegenerationinautomaticparallelizers.InC.Girault, Int.Symp.onHighPerformanceComputing(ISHPC'99),number1615in editor,proc.oftheint.conf.onapplicationsinparallelanddistributed [Col94b] J.-F.Collard.Space-timetransformationofwhile-loopsusingspeculative Computing,IFIPW.G.10.3,pages185{194,Caracas,Venezuela,April1994. NorthHolland. [Col95a] execution.inscalablehighperformancecomputingconf.,pages429{436, J.-F.Collard.Automaticparallelizationofwhile-loopsusingspeculativeexecution.Int.JournalofParallelProgramming,23(2):191{219,April1995. Knoxville,Tennessee,USA,May1994.IEEEComputerSocietyPress. [Col95b] J.-F.Collard.Parallelisationautomatiquedesprogrammesacontr^oledynamique.PhDthesis,UniversitePierreetMarieCurie(ParisVI),France, [Col98] J.-F.Collard.TheadvantagesofreachingdenitionanalysesinArray(S)SA. January In11thWorkshoponLanguagesandCompilersforParallelComputing,number1656inLNCS,pages338{352,ChapelHill,NorthCarolina,USA,August [Cou81] P.Cousot.Semanticfoundationsofprogramsanalysis.Prentice-Hall, Springer-Verlag. [Cre96] B.Creusillet.ArrayRegionAnalysesandApplications.PhDthesis,Ecole [CW99] J.B.CropandD.K.Wilde.Schedulingstructuredsystems.InEuroPar'99, NationaleSuperieuredesMinesdeParis(ENSMP),Paris,France,December LNCS,pages409{412,Toulouse,France,September1999.Springer-Verlag.
254 BIBLIOGRAPHY [Deu90] A.Deutsch.Ondetermininglifetimeandaliasingofdynamicallyallocated 253 California,USA,January1990. ciplesofprogramminglanguages(popl'90),pages157{168,sanfrancisco, datainhigher-orderfunctionalspecications.in17thacmsymp.onprin- [Deu92] technique,france,april1992. tionsofrelationsonregularlanguageswithapplicationtothestaticdeter- minationofdynamicaliasingpropertiesofdata.phdthesis,ecolepoly- A.Deutsch.OperationalModelsofProgrammingLanguagesandRepresenta- [Deu94] A.Deutsch.Interproceduralmay-aliasanalysisforpointers:beyondk- [DGS93] E.Duesterwald,R.Gupta,andM.-L.Soa.Apracticaldataowframework limiting.inacmsymp.onprogramminglanguagedesignandimplementa- tion(pldi'94),pages230{241,orlando,florida,usa,june1994. forarrayreferenceanalysisanditsuseinoptimization.inacmsymp.on [DV97] Albuquerque,NewMexico,USA,jun1993. ProgrammingLanguageDesignandImplementation(PLDI'93),pages68{77, inpolyhedralreduceddependencegraphs.int.journalofparallelprogramming,25(6):447{496,december1997. M.Emami,R.Ghiya,andL.J.Hendren.Context-sensitiveinterprocedural points-toanalysisinthepresenceoffunctionpointers.inacmsymp.on A.DarteandF.Vivien.Optimalneandmediumgrainparallelismdetection [EGH94] [Eil74] 256,June1994. ProgrammingLanguageDesignandImplementation(PLDI'94),pages242{ [EM65] S.Eilenberg.Automata,LanguagesandMachines,volumeA.AcademicPress, [FB98] P.FeautrierandP.Boulet.Scanningpolyhedrawithoutdo-loops.InParallel tomata.ibmjournalofresearchanddevelopment,pages45{68,1965. C.C.ElgotandJ.E.Mezei.Onrelationsdenedbygeneralizedniteau- [Fea88a] ArchitecturesandCompilationTechniques(PACT'98),Paris,France,October 1998.IEEEComputerSocietyPress. [Fea88b] P.Feautrier.Parametricintegerprogramming.RAIRORechercheOperationnelle,22:243{268,September {441,St.Malo,France,July1988. P.Feautrier.Arrayexpansion.InACMInt.Conf.onSupercomputing,pages [Fea91] ParallelProgramming,20(1):23{53,February1991. P.Feautrier.Dataowanalysisofscalarandarrayreferences.Int.Journalof [Fea92] P.Feautrier.Someecientsolutiontotheaneschedulingproblem,partII, [Fea98] multidimensionaltime.int.journalofparallelprogramming,21(6):389{420, EuroPar'98,LNCS,Southampton,UK,September1998.Springer-Verlag. December1992.SeealsoPartI,OneDimensionalTime,21(5):315{348. P.Feautrier.Aparallelizationframeworkforrecursivetreeprograms.In
255 [FM97] 254 P.FradetandD.LeMetayer.Shapetypes.In24thACMSymp.onPrinciples BIBLIOGRAPHY [FS93] C.FrougnyandJ.Sakarovitch.Synchronizedrelationsofnitewords.TheoreticalComputerScience,108:45{82, ofprogramminglanguages(popl'97),pages27{39,paris,france,january [GC95] M.GrieblandJ.-F.Collard.Generationofsynchronouscodeforautomatic [GH95] parallelizationofwhileloops.ins.haridi,k.ali,andp.magnusson,editors, R.GhiyaandL.J.Hendren.Connectionanalysis:Apracticalinterproceduralheapanalysisforc.In8thWorkshoponLanguagesandCompilersfor EuroPar'95,volume966ofLNCS,pages315{326.Springer-Verlag, Springer-Verlag. ParallelComputing,number1033inLNCS,Columbus,Ohio,USA,August [GH96] R.GhiyaandL.J.Hendren.Isitatree,adag,oracyclicgraph?Ashape ofprogramminglanguages(popl'96),pages1{15,st.petersburgbeach, Florida,USA,January1996. analysisforheap-directedpointersinc.in23rdacmsymp.onprinciples [Gup98] [GL97] R.Gupta.Acodemotionframeworkforglobalinstructionscheduling.InInt. M.GrieblandC.Lengauer.TheloopparallelizerLooPo announcement. ConfonCompilerConstruction(CC'98),pages219{233,1998. LNCS,1239:603{607,1997. [H+96] [Har89] M.Halletal.MaximizingmultiprocessorperformancewiththeSUIFcompiler.IEEEComputer,29(12):84{89,December1996. W.L.Harrison.Theinterproceduralanalysisandautomaticparallelisation [HBCM94]M.Hind,M.Burke,P.Carini,andS.Midki.Anempiricalstudyofprecise ofschemeprograms.lispandsymboliccomputation,2(3):176{396,october [HHN92] L.J.Hendren,J.Hummel,,andA.Nicolau.Abstractionsforrecursivepointer interproceduralarrayanalysis.scienticprogramming,3(3):255{271,1994. datastructures:improvingtheanalysisandtransformationofimperativeprograms.inacmsymp.onprogramminglanguagedesignandimplementation [HHN94] J.Hummel,L.J.Hendren,andA.Nicolau.Ageneraldatadependencetest (PLDI'92),pages249{260,SanFrancisco,Calfornia,USA,June1992. LanguageDesignandImplementation(PLDI'94),pages218{229,Orlando, Florida,USA,June1994. fordynamic,pointer-baseddatastructures.inacmsymp.onprogramming [HP96] M.HaghighatandC.Polychronopoulos.Symbolicanalysisforparallelizing 518,July1996. compilers.acmtrans.onprogramminglanguagesandsystems,18(4):477{
256 BIBLIOGRAPHY [HTZ+97]L.J.Hendren,X.Tang,Y.Zhu,S.Ghobrial,G.R.Gao,X.Xue,H.Cai, 255 [HU79] JournalofParallelProgramming,25(4):305{338,August1997. J.E.HopcroftandJ.D.Ullman.IntroductiontoAutomataTheory,Languages,andComputation.Addison-Wesley,1979. F.Irigoin,P.Jouvelot,andR.Triolet.OverviewofthePIPSproject.In andp.ouellet.compilingcfortheearthmultithreadedarchitecture.int. [IJT90] [IT88] F.IrigoinandR.Triolet.Supernodepartitioning.In15thACMSymp.on P.FeautrierandF.Irigoin,editors,2ndInt.WorkshoponCompilersforParallelComputers,pages199{212,Paris,December1990. PrinciplesofProgrammingLanguages(PoPL'88),pages319{328,SanDiego, [JM82] California,USA,January1988. [Kar92] owanalysisandprogramswithrecursivedatastructures.acmpress,1982. G.Karner.Nivat'stheoremforpushdowntransducers.TheoreticalComputer N.D.JonesandS.S.Muchnick.Aexibleapproachtointerproceduraldata [KPRS96]W.Kelly,W.Pugh,E.Rosser,andT.Shpeisman.Transitiveclosureofinnite Science,97:245{262,1992. [KRS94] graphsanditsapplications.int.journalofparallelprogramming,24(6):579{ tice.acmtransactionsonprogramminglanguagesandsystems(toplas), 598, (4):1117{1155,1994. J.Knoop,O.Ruthing,andB.Steen.Optimalcodemotion:Theoryandprac- [KS92] J.KnoopandB.Steen.Theinterproceduralcoincidencetheorem.InProc. [KS93] ofthe4thint.conferenceoncompilerconstruction(cc'92),number641in LNCS,Paderborn,Germany,1992. N.KlarlundandM.I.Schwartzbach.Graphtypes.In20thACMSymp.on [KS98] K.KnobeandV.Sarkar.ArraySSAformanditsuseinparallelization.In SouthCarolina,USA,January1993. PrinciplesofProgrammingLanguages(PoPL'93),pages196{205,Charleston, [KSV96] 25thACMSymp.onPrinciplesofProgrammingLanguages,pages107{120, SanDiego,California,USA,January1998. J.Knoop,B.Steen,andJ.Vollmer.Parallelismforfree:Ecientand [KU77] optimalbitvectoranalysesforparallelprograms.acmtransactionsonprogramminglanguagesandsystems(toplas),18(3):268{299,may1996. [Lef98] J.B.KamandJ.D.Ullman.Monotonedataowanalysisframeworks.Acta vuedesaparallelisation.phdthesis,universitedeversailles,france,february1998. Informatica,7:309{317,1977. V.Lefebvre.Restructurationautomatiquedesvariablesd'unprogrammeen
257 [LF98] 256 V.LefebvreandP.Feautrier.Automaticstoragemanagementforparallel BIBLIOGRAPHY [LH88] J.R.LarusandP.N.Hilnger.Detectingconictsbetweenstructureaccesses.InACMSymp.onProgrammingLanguageDesignandImplementation programs.parallelcomputing,24(3):649{671,1998. [Li92] (PLDI'88),pages21{34,1988. Z.Li.Arrayprivatizationforparallelexecutionofloops.InACMInt.Conf. [LL97] A.W.LimandM.S.Lam.Communication-freeparallelizationviaane onsupercomputing,pages313{322,washington,districtofcolumbia,usa, transformations.in24thacmsymp.onprinciplesofprogramminglanguages,pages201{214,paris,france,jan1997. July1992.ACMPress. [LRZ93] W.A.Landi,B.G.Ryder,andS.Zhang.Interproceduralmodicationside eectanalysiswithpointeraliasing.inacmsymp.onprogramminglanguagedesignandimplementation(pldi'93),pages56{67,albuquerque,new [MAL93] Mexico,USA,June1993. D.E.Maydan,S.P.Amarasinghe,andM.S.Lam.Arraydataowanalysis anditsuseinarrayprivatization.in20thacmsymp.onprinciplesofprogramminglanguages,pages2{15,charleston,southcarolina,usa,january [Mas93] F.Masdupuy.Semanticanalysisofintervalcongruences.InD.Brner, Academgorodok,Novosibirsk,Russia,June1993.Springer-Verlag. ProgrammingandtheirApplications,volume735ofLNCS,pages142{155, M.Broy,andI.V.Pottosin,editors,Int.Conf.onFormalMethodsin [MF98] K.H.RandallM.Frigo,C.E.Leiserson.TheimplementationoftheCilk-5 [Mic95] O.Michel.Designandimplementationof81=2,adeclarativedata-parallel multithreadedlanguage.inacmsymp.onprogramminglanguagedesign andimplementation(pldi'98),pages212{223,montreal,canada,june1998. SymbolicLanguagesandSystems,October1995. UniversiteParisSud(ParisXI),France,1995.ContainspaperGroup-based FieldswithJ.-L.GiavittoandJean-PaulSansonnet,Proc.oftheParallel language.technicalreport1012,laboratoirederechercheeninformatique, [MP94] [Min67] V.MaslovandW.Pugh.Simplifyingpolynomialconstraintsoverintegersto M.Minsky.Computation,FiniteandInniteMachines.Prentice-Hall,1967. [MT90] S.MartelloandP.Toth.KnapsackProblems:AlgorithmsandComputer ofmaryland,february1994. makedependenceanalysismoreprecise.technicalreportcs-tr ,u. [Muc97] S.S.Muchnick.AdvancedCompilerDesign&Implementation.MorganKaufmann,1997. Implementation.JohnWileyandSons,1990.
258 BIBLIOGRAPHY [Par66] R.J.Parikh.Oncontext-freelanguages.JournaloftheACM,13(4):570{581, 257 [PD96] \AutomaticParallelizationinthePolytopeModel",pages79{103. Number1132inLNCS.Springer-Verlag,1996.Forschedulingissues,see G.R.PerrinandA.Darte,editors.TheDataParallelProgrammingModel. [PS98] destelecommunications(enst),paris,france,may1998.toappearin M.PelletierandJ.Sakarovitch.Ontherepresentationofnitedeterministic TheoreticalComputerScience. 2-tapeautomata.TechnicalReport98C002,EcoleNationaleSuperieure [Pug92] [QR99] W.Pugh.Apracticalalgorithmforexactarraydependenceanalysis.CommunicationsoftheACM,35(8):27{47,August1992. F.QuillereandS.Rajopadhye.Optimizingmemoryusageinthepolyhedralmodel.TechnicalReport1228,InstitutdeRechercheenInformatiqueet [RF94] Supercomputing,pages117{125,Manchester,UK,July1994. X.RedonandP.Feautrier.Schedulingreductions.InACMInt.Conf.on SystemesAleatoires,UniversitedeRennes,France,January1999. [Rin97] programsusingoptimisticsynchronizationprimitives.in6thacmsymp.on M.Rinard.Eectivene-grainsynchronizationforautomaticallyparallelized [RR99] PrinciplesandPracticeofParallelProgramming(PPoPP'97),pages112{123, R.RuginaandM.Rinard.Automaticparallelizationofdivideandconquer LasVegas,Nevada,USA,June1997. [RS97a] algorithms.in7thacmsymp.onprinciplesandpracticeofparallelprogramming(ppopp'99),atlanta,georgia,usa,may1999. [RS97b] G.RozenbergandA.Salomaa,editors.HandbookofFormalLanguages,volume1:WordLanguageGrammar.Springer-Verlag,1997. [SCFS98]M.M.Strout,L.Carter,J.Ferrante,andB.Simon.Schedule-independant G.RozenbergandA.Salomaa,editors.HandbookofFormalLanguages,volume3:BeyondWords.Springer-Verlag,1997. storagemappingforloops.inacmsymp.onarchitecturesupportforprogramminglanguagesandoperatingsystems,8,1998. [Sch86] A.Schrijver.TheoryofLinearandIntegerProgramming.JohnWileyand [SKR90] Sons,Chichester,UK,1986. B.Steen,J.Knoop,andO.Ruthing.Thevalueowgraph:Aprogramrepresentationforoptimalprogramtransformations.InProc.ofthe3rdEuropean Copenhagen,Denmark,May1990. Symp.onProgramming(ESOP'90),volume432ofLNCS,pages389{405, [SRH96] 2):131{170,October1996. M.Sagiv,T.Reps,andS.Horwitz.Preciseinterproceduraldataowanalysis withapplicationstoconstantpropagation.ieeetrans.oncomputers,167(1{
259 [SRW96] 258 S.Sagiv,T.W.Reps,andR.Wilhelm.Solvingshape-analysisproblems BIBLIOGRAPHY inlanguageswithdestructiveupdating.in23rdacmsymp.onprinciples [SSP99] Florida,USA,January1996. H.Saito,N.Stavrakos,andC.Polychronopoulos.Multithreadingruntime ofprogramminglanguages(popl'96),pages16{31,st.petersburgbeach, supportforloopandfunctionalparallelism.inint.symp.onhighperformancecomputing(ishpc'99),number1615inlncs,pages133{144,kyoto, Japan,May1999.Springer-Verlag. [Ste96] B.Steensgaard.Points-toanalysisinalmostlineartime.In23rdACMSymp.on [TD95] O.TemamandN.Drach.Softwareassistancefordatacaches.FutureGenerationComputerSystems,1995.Specialissueonhighperformancecomputer Beach,Florida,USA,January1996. PrinciplesofProgrammingLanguages(PoPL'96),pages32{41,St.Petersburg [TFJ86] R.Triolet,P.Feautrier,andP.Jouvelot.Automaticparallelizationoffortran programsinthepresenceofprocedurecalls.inproc.ofthe1steuropeansymp. architectures. [TP93] P.TuandD.Padua.Automaticarrayprivatization.In6thWorkshopon onprogramming(esop'86),number213inlncs,pages210{222.springer- LanguagesandCompilersforParallelComputing,number768inLNCS,pages Verlag,March1986. [TP95] 500{521,Portland,Oregon,USA,August1993. P.TuandD.Padua.GatedSSA-Baseddemand-drivensymbolicanalysisfor [Tzo97] S.Tzolovski.Datadependencesasabstractinterpretations.InInternational Barcelona,Spain,July1995. parallelizingcompilers.inacmint.conf.onsupercomputing,pages414{423, [Wol92] M.Wolfe.Beyondinductionvariables.InACMSymp.onProgrammingLanguageDesignandImplementation(PLDI'92),pages162{174,SanFrancisco, California,USA,June1992. StaticAnalysisSymposiumSAS'97,Paris,France,1997. [Won95] [WP95] D.WonnacottandW.Pugh.Nonlineararraydependenceanalysis.InProc. UniversityofMaryland,1995. D.G.Wonnacott.Constraint-BasedArrayDependenceAnalysis.PhDthesis, [WR93] Computers,1995.Troy,NewYork,USA. D.K.WildeandS.Rajopadhye.Allocatingmemoryarraysforpolyhedra. ThirdWorkshoponLanguages,CompilersandRun-TimeSystemsforScalable TechnicalReport749,InstitutdeRechercheenInformatiqueetSystemes Aleatoires,UniversitedeRennes,France,July1993.
260 Index Symbols <lex,70,seelexicographicorder,75, 140,197 <par,81,seeparallelexecutionorder <seq,70,seesequentialexecutionorder <txt,70,seetextualorder,144 ctrl,66,seestatementlabel Lctrl,68,seecontrolword,70,129,139 Ldata,71,seedatastructure abstraction,140 Mdata,71,seedatastructure abstraction,129,140 [i;],128,seeinductionvariable,130, 135 [i](w),128,seeinductionvariable Dexp,156,seememoryexpansion ES,196,seeexpansionvector ES[p+1],197,seeexpansiondegree A,80,seeaccess,82,134 Ae,63,seeaccess,80 E,62,seeprogramexecution,70,129, 156,191,222 I,80,seeinstance,82 Ie,62,seeinstance,68,80 R,80,seeread,140 Re,63,seereadandaccess,80 W,80,seewrite,140 We,63,seewriteandaccess,80,92,seestackalphabetand push-downautomaton 0,92,seeinitialstackwordand push-downautomaton hs;xi,75,seeiterationvectorand instance hs;x;refi,75,seeiterationvectorand access,209,seeconstraintrelation,214 R,173,seestaticexpansion,175 R,173,seestaticexpansion W,217,seeweakenedstaticexpansion W,217,seeweakenedstaticexpansion,77,seedependencerelation,140 e,77,seedependencerelation,140 exp,81,seedependencerelationand memoryexpansion,82,210,214 exp e,81,seedependencerelationand memoryexpansion,175,seeconictrelationandstatic expansion,76,seeconictrelation,175,191 e,76,seeconictrelation,191 6,191,seeno-conictrelation,193 6e,191,seeno-conictrelation./,193,seeinterferencerelation,194, 210,211,211,seeinterferencerelation,214,212,seecoloringrelation,213,seeconstraintcoloringrelation,78,seereachingdenition ml,164,seereachingdenitionofa memorylocationandmemory expansion ml e,164,seereachingdenitionofa memorylocationandmemory expansion e,77,seereachingdenition,156,seememoryexpansion,168,174, 217,219 JoinsA,220,seejoin Points,219,seeprogrampoint Ancestors(u),142,seeancestor Array,160,seememoryexpansion CurIns,156,seerun-timeinstanceand memoryexpansion,227,240 Iter,160,seememoryexpansionand iterationvector Stmt,160,seememoryexpansionand iterationvector Undefined,130,seeinductionvariable,82,seeschedule,85 ",91,seeemptyword fexp e,81,seestoragemappingand 259
261 260 INDEX memoryexpansion,173,191,209 fe,75,seestoragemapping,173 f,129,seestoragemapping AS[x],160,seememoryexpansion Dexp,157,seememoryexpansion A-selection,108 -selection,141,231 access,63,75 A,80,82,134 Ae,63,80 R,140 Re,63,80 W,140 We,63,80 hs;x;refi,75 algebraicfunction,116 algebraicgrammar,92 algebraiclanguage,92 algebraicrelation,115 algebraictransducer,114,see push-downtransducer aliased,65 analysisofconictingaccesses,76 ancestor,142,144,148 Ancestors(u),142 Bblock,63 Ccalltree,70 causalityconstraint,82 coloringrelation,212,212 complete,105 conguration,93,114 conict,76 conictequation,175 conictrelation,76,139,140,191,211,175,76,191 e,76,191 constrainedexpansion,209 constraintcoloringrelation,213,213 constraintrelation,209,209,214,214 context-freegrammar,92 context-freelanguage,92 controlautomaton,67 compressed,69 controlparallelism,58 controltree,70,123,142 compressed,70 controlword,68 Lctrl,68,70,129,139 Ddataparallelism,59 datastructureabstraction,139 Ldata,71,140 Mdata,71,129,140 data-owexecutionorder,200 -synchronizable,102 -synchronous,102 dependence,77 dependenceanalysis,77 dependencerelation,77,77,140 e,77,140 exp,81,82,210,214 exp e,81 deterministicalgebraiclanguages,93 dominancefrontier,219 dynamicarrays,160 Eedgename,64,71,236 emptyword,91 ",91 executionfront,60 executiontrace,66 expansioncorrectnesscriterion,192, 193,194 expansiondegree,197 ES[p+1],197 expansionvector,196 ES,196 Fner,81,seestoragemapping,174 nite-stateautomaton deterministic,91 nitelygenerated,91,97,98
262 INDEX 261 formallanguage,91 freemonoid,91 freepartiallycommutativemonoid,72, 118 Iinductionvariable,127 [i;],128,130,135 [i](w),128 Undefined,130 undenedvalue,130 valueataninstance,128 initialstackword,92 0,92 inputautomaton,100 instance,62,75 I,80,82 Ie,62,68,80 hs;xi,75 integerlinearprogramming,87 interferencerelation,193,210,211./,193,194,210,211,211,214 iterationvector hs;x;refi,75 hs;xi,75 Iter,160 Stmt,160 iterationvectors,74 Jjoin,219 JoinsA,220 Lleft-synchronizable,102 left-synchronous,102,148,231 lexicographicorder,70,75,88,103,140 <lex,70,75,140,197 loopvariable,64 Mmaximal,211 maximalconstrainedexpansion,222 maximalstaticexpansion,174 memoryexpansion,81 Dexp,156 exp,81,82,210,214 exp e,81 ml,164 ml e,164,156,168,174,217,219 Array,160 CurIns,156,227,240 Iter,160 Stmt,160 fexp e,81,173,191,209 -structures,157 Dexp,157 monoid,90 Nno-conictrelation,191 6,191,193 6e,191 Oone-counterautomaton,94,95 one-counterlanguage,95 one-counterrelation,116 one-countertransducer,116 onlinealgebraictransducer,116 onlinealgebraictransduction,116,231 onlinerationaltransducer,101 onlinerationaltransduction,101,231 outputautomaton,100 Pparallelexecutionorder,81 <par,81 parallelization,81 partialexpansion,196,197 partialrenaming,196 path,91,99 label,91,99 privatization,233 programexecution,62 E,62,70,129,156,191,222 programpoint,219 Points,219 pseudo-left-synchronizable,119 pseudo-left-synchronous,119,148 push-downautomaton,92,92 0,92 deterministic,93 push-downtransducer,114
263 262 INDEX push-downautomaton interpretation,115 underlyingrationaltransducer,118, 120,122 Qquasi-aneselectiontree,88,seequast quast,88,160,165 quasi-aneselectiontree,88 Rrationalfunction,99 rationallanguage,92 rationalrelation,97,128 rationalset,97,128 rationaltransducer,98 nite-stateautomaton interpretation,99,107 reachingdenition,77,173,78 e,77 reachingdenitionanalysis,78 reachingdenitionofamemory location,164,217 ml,164 ml e,164 read,63 R,80,140 Re,63,80 realize,91,99,135,140 byemptystack,93,115 bynalstate,93,94,114,116 recognizablerelation,97,148 recognizableset,97 regularlanguage,91,seerational language right-synchronizable,103 right-synchronous,103 run-timeinstance,61 CurIns,156,227,240 SA,156 schedule,59,82,85,82,85 schedule-independent,188,200 semi-group,90 sequentialexecutionorder,70 <seq,70 sequentialfunction,100,231 sequentialtransducer,100 shapeanalysis,65 single-assignment,156 SSA,156 stackalphabet,92,92 statement,63 statementlabel,66 ctrl,66 staticexpansion,173 R,173,175 R,173,175 staticsingle-assignment,156 storagemapping,75,126,128,135 fexp e,81,173,191,209 fe,75,173 ner,81 sub-sequentialfunction,101,231 sub-sequentialtransducer,100 synchronizable,102 synchronizationgraph,236 synchronous,102 Ttextualorder,70,144 <txt,70,144 tiling,84 tile,84 topstacksymbol,93 transduction,98 algebraic,115 rational,98 recognizable,98 transmissionrate,110 trim,91,99 Uunambiguous,105 underlyingrationaltransducer,148 use,77,173 Wweakenedstaticexpansion,217 W,217 W,217 write,63 W,80,140 We,63,80
264 INDEX 263
265 tionsdeviennenttropspeciquesetcomplexespour^etrelaisseesausoinduprogram- meur.lestechniquesdeparallelisationautomatiquedepassentlecadretraditionnel quelesnidsdebouclesnonanes,lesappelsrecursifsetlesstructuresdedonnees dynamiques.desanalysesprecisessontaucurdeladetectionduparallelisme,elles l'execution.cesinformationsvalidentdestransformationsutilespourl'extraction duparallelismeetlagenerationdecodeparallele. rassemblentdesinformationsalacompilationsurlesproprietesdesprogrammesa veauxdesauxtechniquesdecompilation.enpresencedeparallelisme,lesoptimisa- Lesmicroprocesseursetlesarchitecturesparallelesd'aujourd'huilancentdenou- Resume desapplicationsnumeriquesetabordentdenouveauxmodelesdeprogrammes,tels visionparinstances,c'est-a-direconsiderantlesproprietesindividuellesdechaque instanced'uneinstructional'execution.unenouvelleformalisational'aidedelangagesformelsnouspermettoutd'abordd'etudieruneanalysededependancesetde Cettetheseabordeprincipalementdesanalysesetdestransformationsavecune partiedecetravail.unenouvelleetudedestechniquesdeparallelisationfondeessur analyseal'expansionetlaparallelisationdeprogrammesrecursifsdevoiledesresultatsencourageants.lesnidsdebouclesquelconquesfontl'objetdeladeuxieme denitionsvisiblesparinstancespourprogrammesrecursifs.l'applicationdecette Mots-cles:parallelisationautomatique,programmesrecursifs,nidsdebouclesnonaf- l'expansionnouspermetdeproposerdessolutionsadesproblemesd'optimisation nes,analysededependances,analysededenitionsvisibles,expansiondelamemoire. cruciaux. ingnewchallenges.dealingwithparallelexecution,optimizationsbecomeoverly specicandcomplextobelefttotheprogrammer.traditionallydevotedtonumericalapplications,automaticparallelizationaddressesnewprogrammodels,including Compilationfortodaysmicroprocessorandmulti-processorarchitecturesisfac- Abstract lelismdetectionisbasedonpreciseanalyses,gatheringcompile-timeinformation aboutrun-timeprogramproperties.thisinformationenablestransformationsusefultoparallelismextractionandparallelcodegeneration. non-anenestsofloops,recursivecallsandpointer-baseddatastructures.paral- rstinvestigateinstancewisedependenceandreachingdenitionanalysisforrecursiveprograms.thisanalysisisappliedtomemoryexpansionandparallelizationof recursiveprograms,andpromisingresultsareexposed.thesecondpartofthiswork aninstancewisepointofview,thatisfromindividualpropertiesofeachrun-time instanceofaprogramstatement.thankstoanovelformallanguageframework,we Thisthesisfocusesonaggressiveanalysisandtransformationtechniquesfrom addressesnestsofloopswithunrestrictedconditionals,boundsandarraysubscripts. Keywords:automaticparallelization,recursiveprograms,non-aneloopnests,dependenceanalysis,reachingdenitionanalysis,memoryexpansion. challengingoptimizationproblemsareproposed. Parallelizationviamemoryexpansionisrevisitedinthiscontextandsolutionsto
La voix en images : comment l évaluation objectivée par logiciel permet d optimiser la prise en charge vocale
La voix en images : comment l évaluation objectivée par logiciel permet d optimiser la prise en charge vocale Stéphanie Perriere To cite this version: Stéphanie Perriere. La voix en images : comment l
Système de diffusion d information pour encourager les PME-PMI à améliorer leurs performances environnementales
Système de diffusion d information pour encourager les PME-PMI à améliorer leurs performances environnementales Natacha Gondran To cite this version: Natacha Gondran. Système de diffusion d information
statique J. Bertrand To cite this version: HAL Id: jpa-00237017 https://hal.archives-ouvertes.fr/jpa-00237017
Quelques théorèmes généraux relatifs à l électricité statique J. Bertrand To cite this version: J. Bertrand. Quelques théorèmes généraux relatifs à l électricité statique. J. Phys. Theor. Appl., 1874,
Sur le grossissement des divers appareils pour la mesure des angles par la réflexion d un faisceau lumineux sur un miroir mobile
Sur le grossissement des divers appareils pour la mesure des angles par la réflexion d un faisceau lumineux sur un miroir mobile W. Lermantoff To cite this version: W. Lermantoff. Sur le grossissement
Étude des formes de pratiques de la gymnastique sportive enseignées en EPS à l école primaire
Étude des formes de pratiques de la gymnastique sportive enseignées en EPS à l école primaire Stéphanie Demonchaux To cite this version: Stéphanie Demonchaux. Étude des formes de pratiques de la gymnastique
Les intermédiaires privés dans les finances royales espagnoles sous Philippe V et Ferdinand VI
Les intermédiaires privés dans les finances royales espagnoles sous Philippe V et Ferdinand VI Jean-Pierre Dedieu To cite this version: Jean-Pierre Dedieu. Les intermédiaires privés dans les finances royales
AGROBASE : un système de gestion de données expérimentales
AGROBASE : un système de gestion de données expérimentales Daniel Wallach, Jean-Pierre RELLIER To cite this version: Daniel Wallach, Jean-Pierre RELLIER. AGROBASE : un système de gestion de données expérimentales.
Dessin assisté par ordinateur en lycée professionnel
Dessin assisté par ordinateur en lycée professionnel Bernard Dauga To cite this version: Bernard Dauga. Dessin assisté par ordinateur en lycée professionnel. Bulletin de l EPI (Enseignement Public et Informatique),
L indice de SEN, outil de mesure de l équité des systèmes éducatifs. Une comparaison à l échelle européenne
L indice de SEN, outil de mesure de l équité des systèmes éducatifs. Une comparaison à l échelle européenne Sophie Morlaix To cite this version: Sophie Morlaix. L indice de SEN, outil de mesure de l équité
Compte-rendu de Hamma B., La préposition en français
Compte-rendu de Hamma B., La préposition en français Badreddine Hamma To cite this version: Badreddine Hamma. Compte-rendu de Hamma B., La préposition en français. Revue française de linguistique appliquée,
Peut-on perdre sa dignité?
Peut-on perdre sa dignité? Eric Delassus To cite this version: Eric Delassus. Peut-on perdre sa dignité?. 2013. HAL Id: hal-00796705 https://hal.archives-ouvertes.fr/hal-00796705 Submitted
Notes de lecture : Dan SPERBER & Deirdre WILSON, La pertinence
Notes de lecture : Dan SPERBER & Deirdre WILSON, La pertinence Gwenole Fortin To cite this version: Gwenole Fortin. Notes de lecture : Dan SPERBER & Deirdre WILSON, La pertinence. 2006.
Sylvain Meille. Étude du comportement mécanique du plâtre pris en relation avec sa microstructure.
Étude du comportement mécanique du plâtre pris en relation avec sa microstructure Sylvain Meille To cite this version: Sylvain Meille. Étude du comportement mécanique du plâtre pris en relation avec sa
Budget Constrained Resource Allocation for Non-Deterministic Workflows on a IaaS Cloud
Budget Constrained Resource Allocation for Non-Deterministic Workflows on a IaaS Cloud Eddy Caron, Frédéric Desprez, Adrian Muresan, Frédéric Suter To cite this version: Eddy Caron, Frédéric Desprez, Adrian
Jean-Luc Archimbaud. Sensibilisation à la sécurité informatique.
Sensibilisation à la sécurité informatique Jean-Luc Archimbaud To cite this version: Jean-Luc Archimbaud. Sensibilisation à la sécurité informatique. lieux en France, 1997, pp.17. École
Les Champs Magnétiques
Les Champs Magnétiques Guillaume Laurent To cite this version: Guillaume Laurent. Les Champs Magnétiques. École thématique. Assistants de prévention, Paris, France. 2014, pp.31. HAL Id:
Comptabilité à base d activités (ABC) et activités informatiques : une contribution à l amélioration des processus informatiques d une banque
Comptabilité à base d activités (ABC) et activités informatiques : une contribution à l amélioration des processus informatiques d une banque Grégory Wegmann, Stephen Nozile To cite this version: Grégory
Un SIG collaboratif pour la recherche historique Partie. Partie 1 : Naissance et conception d un système d information géo-historique collaboratif.
Un SIG collaboratif pour la recherche historique Partie 1 : Naissance et conception d un système d information géo-historique collaboratif Claire-Charlotte Butez, Francesco Beretta To cite this version:
Les déterminants du volume d aide professionnelle pour. reste-à-charge
Les déterminants du volume d aide professionnelle pour les bénéficiaires de l APA à domicile : le rôle du reste-à-charge Cécile Bourreau-Dubois, Agnès Gramain, Helen Lim, Jingyue Xing, Quitterie Roquebert
La complémentaire santé : une généralisation qui
La complémentaire santé : une généralisation qui n efface pas les inégalités Thibaut De Saint Pol, François Marical To cite this version: Thibaut De Saint Pol, François Marical. La complémentaire santé
Sur la transformation de l électricité statique en électricité dynamique
Sur la transformation de l électricité statique en électricité dynamique E. Bichat To cite this version: E. Bichat. Sur la transformation de l électricité statique en électricité dynamique. J. Phys. Theor.
Un exemple spécifique de collaboration : Le produit-partage
Un exemple spécifique de collaboration : Le produit-partage Béatrice Parguel To cite this version: Béatrice Parguel. Un exemple spécifique de collaboration : Le produit-partage. 50 fiches sur le marketing
Famille continue de courbes terminales du spiral réglant pouvant être construites par points et par tangentes
Famille continue de courbes terminales du spiral réglant pouvant être construites par points et par tangentes M. Aubert To cite this version: M. Aubert. Famille continue de courbes terminales du spiral
Protection de la vie privée basée sur des ontologies dans un système Android
Protection de la vie privée basée sur des ontologies dans un système Android Johann Vincent, Tom Dubin, Christine Porquet To cite this version: Johann Vincent, Tom Dubin, Christine Porquet. Protection
Jessica Dubois. To cite this version: HAL Id: jpa-00205545 https://hal.archives-ouvertes.fr/jpa-00205545
Mesures de la charge électrique de gouttelettes d eau ou de solutions salines au cours de processus d évaporation, ou de condensation de vapeur d eau sur elles Jessica Dubois To cite this version: Jessica
Calculer les coûts ou bénéfices de pratiques sylvicoles favorables à la biodiversité : comment procéder?
Calculer les coûts ou bénéfices de pratiques sylvicoles favorables à la biodiversité : comment procéder? H. Chevalier, M. Gosselin, Sebastian Costa, Y. Paillet, M. Bruciamacchie To cite this version: H.
AICp. Vincent Vandewalle. To cite this version: HAL Id: inria-00386678 https://hal.inria.fr/inria-00386678
Sélection prédictive d un modèle génératif par le critère AICp Vincent Vandewalle To cite this version: Vincent Vandewalle. Sélection prédictive d un modèle génératif par le critère AICp. 41èmes Journées
Modèle d évaluation quantitative des risques liés au transport routier de marchandises dangereuses
Modèle d évaluation quantitative des risques liés au transport routier de marchandises dangereuses Raphaël Defert To cite this version: Raphaël Defert. Modèle d évaluation quantitative des risques liés
LIVRAISON DE COLIS ET LOGISTIQUE URBAINE : QUELLES RECOMPOSITIONS DE LA MESSAGERIE EN MILIEU URBAIN?
LIVRAISON DE COLIS ET LOGISTIQUE URBAINE : QUELLES RECOMPOSITIONS DE LA MESSAGERIE EN MILIEU URBAIN? Raphaëlle Ducret To cite this version: Raphaëlle Ducret. LIVRAISON DE COLIS ET LOGISTIQUE URBAINE :
Perspectives du développement de l énergie solaire en U.R.S.S. : conversion thermodynamique en électricité
Perspectives du développement de l énergie solaire en U.R.S.S. : conversion thermodynamique en électricité P.P. Aparissi, I.A. Malevsky, B.V. Tarnijevsky, V.K. Goucev, A.M. Karpenko To cite this version:
La communication sociétale : entre opportunités et risques d opportunisme
La communication sociétale : entre opportunités et risques d opportunisme Florence Benoît-Moreau, Fabrice Larceneux, Béatrice Parguel To cite this version: Florence Benoît-Moreau, Fabrice Larceneux, Béatrice
e-science : perspectives et opportunités pour de nouvelles pratiques de la recherche en informatique et mathématiques appliquées
Emilie MANON, Joanna JANIK, Gabrielle FELTIN e-science : perspectives et opportunités pour de nouvelles pratiques de la recherche en informatique et mathématiques appliquées 1 Introduction : La recherche
Les liaisons intermoléculaires de l eau étudiées dans
Les liaisons intermoléculaires de l eau étudiées dans l infrarouge à 3µ G. Bosschieter, J. Errera To cite this version: G. Bosschieter, J. Errera. Les liaisons intermoléculaires de l eau étudiées dans
LES CLAUSES DES CONTRATS DE DETTE DES PETITES ENTREPRISES : CAS DES ENTREPRISES CANADIENNES
LES CLAUSES DES CONTRATS DE DETTE DES PETITES ENTREPRISES : CAS DES ENTREPRISES CANADIENNES Julien Bilodeau, Franck Missonier-Piera, Igor Oliveira Dos Santos To cite this version: Julien Bilodeau, Franck
La régulation du réseau Internet
La régulation du réseau Internet Philippe Barbet To cite this version: Philippe Barbet. La régulation du réseau Internet. Société de l information: Approche économique et juridique, l harmattan, pp.6,
La vidéosurveillance à l école : du maintien de l ordre à l autodiscipline
La vidéosurveillance à l école : du maintien de l ordre à l autodiscipline Eric Heilmann To cite this version: Eric Heilmann. La vidéosurveillance à l école : du maintien de l ordre à l autodiscipline.
JRES 2005 : La mémorisation des mots de passe dans les navigateurs web modernes
JRES 2005 : La mémorisation des mots de passe dans les navigateurs web modernes Didier Chassignol, Frédéric Giquel To cite this version: Didier Chassignol, Frédéric Giquel. JRES 2005 : La mémorisation
INTELLIGIBILITÉ DE LA PAROLE EN CHAMBRE SOURDE - INFLUENCE DU DIFFUSEUR
INTELLIGIBILITÉ DE LA PAROLE EN CHAMBRE SOURDE - INFLUENCE DU DIFFUSEUR A. Randrianarison, C. Legros To cite this version: A. Randrianarison, C. Legros. INTELLIGIBILITÉ DE LA PAROLE EN CHAMBRE SOURDE -
Bourses d excellence pour les masters orientés vers la recherche
Masters de Mathématiques à l'université Lille 1 Mathématiques Ingénierie Mathématique Mathématiques et Finances Bourses d excellence pour les masters orientés vers la recherche Mathématiques appliquées
La diversification de la mise en valeur traditionnelle des bas-fonds en zone de savane humide du Togo et les conséquences sur l environnement
La diversification de la mise en valeur traditionnelle des bas-fonds en zone de savane humide du Togo et les conséquences sur l environnement Soklou Worou To cite this version: Soklou Worou. La diversification
UNIVERSITE LYON 3 (JEAN MOULIN) Référence GALAXIE : 4140
UNIVERSITE LYON 3 (JEAN MOULIN) Référence GALAXIE : 4140 Numéro dans le SI local : Référence GESUP : 0202 Corps : Professeur des universités Article : 51 Chaire : Non Section 1 : 01-Droit privé et sciences
PROGRAMMATION MOTRICE ET STRATÉGIES COGNITIVES DANS UNE TÂCHE DE SYNCHRONISATION
PROGRAMMATION MOTRICE ET STRATÉGIES COGNITIVES DANS UNE TÂCHE DE SYNCHRONISATION Catherine Auxiette, C. Gerard To cite this version: Catherine Auxiette, C. Gerard. PROGRAMMATION MOTRICE ET STRATÉGIES COG-
Les archives de Luc Bérimont à la bibliothèque universitaire d Angers
Les archives de Luc Bérimont à la bibliothèque universitaire d Angers France Chabod To cite this version: France Chabod. Les archives de Luc Bérimont à la bibliothèque universitaire d Angers. Luc Bérimont,
Etude des convertisseurs statiques continu-continu à résonance, modélisation dynamique
Etude des convertisseurs statiques continucontinu à résonance, modélisation dynamique J.P. Ferrieux, J. Perard, E. Olivier To cite this version: J.P. Ferrieux, J. Perard, E. Olivier. Etude des convertisseurs
8. Cours virtuel Enjeux nordiques / Online Class Northern Issues Formulaire de demande de bourse / Fellowship Application Form
F-8a-v1 1 / 7 8. Cours virtuel Enjeux nordiques / Online Class Northern Issues Formulaire de demande de bourse / Fellowship Application Form Nom de famille du candidat Langue de correspondance Français
Utilisation d outils de Visual Data Mining pour l exploration d un ensemble de règles d association
Utilisation d outils de Visual Data Mining pour l exploration d un ensemble de règles d association Gwenael Bothorel, Mathieu Serrurier, Christophe Hurter To cite this version: Gwenael Bothorel, Mathieu
Marie Curie Individual Fellowships. Jean Provost Marie Curie Postdoctoral Fellow, Institut Langevin, ESCPI, INSERM, France
Marie Curie Individual Fellowships Jean Provost Marie Curie Postdoctoral Fellow, Institut Langevin, ESCPI, INSERM, France Deux Soumissions de Projet Marie Curie International Incoming Fellowship Finance
Services à la recherche: Data Management et HPC *
Services à la recherche: Data Management et HPC * Pierre-Yves Burgi et Jean-François Rossignol Division informatique (DINF) * HPC = High-Performance Computing Réunion CIF Sciences du 6.12.11 1/19 Contenu
MSO MASTER SCIENCES DES ORGANISATIONS GRADUATE SCHOOL OF PARIS- DAUPHINE. Département Master Sciences des Organisations de l'université Paris-Dauphine
MSO MASTER SCIENCES DES ORGANISATIONS GRADUATE SCHOOL OF PARIS- DAUPHINE Département Master Sciences des Organisations de l'université Paris-Dauphine Mot du directeur Le département «Master Sciences des
LA CONJONCTION MÊME SI N EXISTE PAS!
LA CONJONCTION MÊME SI N EXISTE PAS! Mireille Piot To cite this version: Mireille Piot. LA CONJONCTION MÊME SI N EXISTE PAS!. Christian Leclère, Eric Laporte, Mireille Piot et Max Silberztein éds. Benjamins,
Jean-Philippe DANGLADE
Jean-Philippe DANGLADE Professor of Marketing Research Coordinator / Scientific Advisor BP 921 13288 Marseille cedex 9 France PROFESSIONAL +33 4 91 82 73 34 [email protected] EXPERIENCES
MSO MASTER SCIENCES DES ORGANISATIONS GRADUATE SCHOOL OF PARIS- DAUPHINE. Département Master Sciences des Organisations de l'université Paris-Dauphine
MSO MASTER SCIENCES DES ORGANISATIONS GRADUATE SCHOOL OF PARIS DAUPHINE Département Master Sciences des Organisations de l'université ParisDauphine Mot du directeur Le département «Master Sciences des
Les 35 heures en douceur
Les 35 heures en douceur Gerard Cornilleau, Eric Heyer, Xavier Timbeau To cite this version: Gerard Cornilleau, Eric Heyer, Xavier Timbeau. Les 35 heures en douceur. Lettre de l OFCE, 1998, pp.1-8.
affichage en français Nom de l'employeur *: Lions Village of Greater Edmonton Society
LIONS VILLAGE of Greater Edmonton Society affichage en français Informations sur l'employeur Nom de l'employeur *: Lions Village of Greater Edmonton Society Secteur d'activité de l'employeur *: Développement
Réalisation d un dispositif de mesure de la conductibilité thermique des solides à basses températures
Réalisation d un dispositif de mesure de la conductibilité thermique des solides à basses températures P.L. Vuillermoz, P. Pinard, F. Davoine To cite this version: P.L. Vuillermoz, P. Pinard, F. Davoine.
Utiliser un proxy sous linux
Utiliser un proxy sous linux Par LoiselJP Le 22/05/2013 Objectif Les systèmes d exploitation sont de plus en plus gourmand en ressource, notemment en consommation réseau. C est d autant plus vrai que Linux
Les enjeux économiques de la géolocalisation pour les réseaux sociaux numériques
Les enjeux économiques de la géolocalisation pour les réseaux sociaux numériques Marc Bassoni, Félix Weygand To cite this version: Marc Bassoni, Félix Weygand. Les enjeux économiques de la géolocalisation
Génération de code binaire pour application multimedia : une approche au vol
Génération de binaire pour application multimedia : une approche au vol http://hpbcg.org/ Henri-Pierre Charles Université de Versailles Saint-Quentin en Yvelines 3 Octobre 2009 Présentation Présentation
Le libre parcours moyen des électrons de conductibilité. des électrons photoélectriques mesuré au moyen de la méthode des couches minces. J. Phys.
Le libre parcours moyen des électrons de conductibilité et des électrons photoélectriques mesuré au moyen de la méthode des couches minces H. Mayer, R. Nossek, H. Thomas To cite this version: H. Mayer,
PeTEX Plateforme pour e-learning et expérimentation télémétrique
PeTEX Plateforme pour e-learning et expérimentation télémétrique 142270-LLP-1-2008-1-DE-LEONARDO-LMP 1 Information sur le projet Titre: Code Projet: Année: 2008 Type de Projet: Statut: Accroche marketing:
F-7a-v3 1 / 5. 7. Bourses de mobilité / Mobility Fellowships Formulaire de demande de bourse / Fellowship Application Form
F-7a-v3 1 / 5 7. Bourses de mobilité / Mobility Fellowships Formulaire de demande de bourse / Fellowship Application Form Nom de famille du candidat Langue de correspondance Français Family name of participant
HAL-Pasteur. La plate-forme d archive ouverte de l Institut Pasteur. Formation au dépôt d articles. http://hal-pasteur.archives-ouvertes.
HAL-Pasteur La plate-forme d archive ouverte de l Institut Pasteur Formation au dépôt d articles http://hal-pasteur.archives-ouvertes.fr [email protected] HAL-Pasteur L équipe HAL Pasteur est à votre
Travail émotionnel et soins infirmiers
Travail émotionnel et soins infirmiers Marc Loriol To cite this version: Marc Loriol. Travail émotionnel et soins infirmiers. Santé mentale, 2013, pp.60-63. HAL Id: hal-00925629 https://hal.archives-ouvertes.fr/hal-00925629
Les taxis piégés par 36 000 communes
Les taxis piégés par 36 000 communes Richard Darbéra To cite this version: Richard Darbéra. Les taxis piégés par 36 000 communes. Transports, Editions techniques et économiques, 2014, pp.62-66.
L industrie pharmaceutique et la grippe aviaire
L industrie pharmaceutique et la grippe aviaire Présentation! " #$ %$ & ' (()* " +,- Gestion de risques Éco-Management inc. Gestion Environnement Politiques et procédures ISO 9001:2000 Évaluation des risques
Installation d'un TSE (Terminal Serveur Edition)
Installation d'un TSE (Terminal Serveur Edition) Par LoiselJP Le 01/05/2013 (R2) 1 Objectifs Le TSE, comprenez Terminal Server Edition est une application de type 'main-frame' de Microsoft qui réside dans
Sécuristation du Cloud
Schémas de recherche sur données chiffrées avancés Laboratoire de Cryptologie Thales Communications & Security 9 Avril 215 9/4/215 1 / 75 Contexte Introduction Contexte Objectif Applications Aujourd hui
Analyse des comportements dans un système de transfert d argent sur mobile
Analyse des comportements dans un système de transfert d argent sur mobile Chrystel Gaber, Romain Giot, Mohammed Achemlal, Baptiste Hemery, Marc Pasquet, Pascal URIEN To cite this version: Chrystel Gaber,
Les BRMS Business Rules Management System. Groupe GENITECH
Les BRMS Business Rules Management System 1 Présentations Emmanuel Bonnet ebonnet (at) genigraph.fr Responsable Dpt Conseil Consultant, Expert BRMS Formateur IBM/Ilog JRules / JBoss Rules Génigraph SSII
Maintenabilité d un parc applicatif
1 Maintenabilité d un parc applicatif Une méthode pour évaluer les charges de maintenance 13/06/01 Jean-François Bailliot 2 Maintenabilité d un parc applicatif Maintenance / Développement importance relative
DOCUMENTATION - FRANCAIS... 2
DOCUMENTATION MODULE CATEGORIESTOPMENU MODULE CREE PAR PRESTACREA INDEX : DOCUMENTATION - FRANCAIS... 2 INSTALLATION... 2 CONFIGURATION... 2 LICENCE ET COPYRIGHT... 3 SUPPORT TECHNIQUE ET MISES A JOUR...
Cheque Holding Policy Disclosure (Banks) Regulations. Règlement sur la communication de la politique de retenue de chèques (banques) CONSOLIDATION
CANADA CONSOLIDATION CODIFICATION Cheque Holding Policy Disclosure (Banks) Regulations Règlement sur la communication de la politique de retenue de chèques (banques) SOR/2002-39 DORS/2002-39 Current to
Application Form/ Formulaire de demande
Application Form/ Formulaire de demande Ecosystem Approaches to Health: Summer Workshop and Field school Approches écosystémiques de la santé: Atelier intensif et stage d été Please submit your application
Les marchés Security La méthode The markets The approach
Security Le Pôle italien de la sécurité Elsag Datamat, une société du Groupe Finmeccanica, représente le centre d excellence national pour la sécurité physique, logique et des réseaux de télécommunication.
La Relation Client : Quand l Analyse du Discours rencontre le Marketing
La Relation Client : Quand l Analyse du Discours rencontre le Marketing Séverine Equoy Hutin To cite this version: Séverine Equoy Hutin. La Relation Client : Quand l Analyse du Discours rencontre le Marketing.
Guide d'installation rapide TFM-560X YO.13
Guide d'installation rapide TFM-560X YO.13 Table of Contents Français 1 1. Avant de commencer 1 2. Procéder à l'installation 2 Troubleshooting 6 Version 06.08.2011 16. Select Install the software automatically
A propos de la désirabilité et de la faisabilité entrepreneuriales perçues pour les jeunes femmes libanaises: une enquête exploratoire.
A propos de la désirabilité et de la faisabilité entrepreneuriales perçues pour les jeunes femmes libanaises: une enquête exploratoire. Sarah Sawma, Thierry Levy-Tadjine To cite this version: Sarah Sawma,
http://www.international.umontreal.ca/echange/cap-udem/guide.html#finaliser
Requisitos de conocimiento de idioma* CANADÁ Université de Montréal http://www.international.umontreal.ca/echange/cap-udem/guide.html#finaliser Une attestation de votre connaissance du français, si ce
REMOTE DATA ACQUISITION OF EMBEDDED SYSTEMS USING INTERNET TECHNOLOGIES: A ROLE-BASED GENERIC SYSTEM SPECIFICATION
REMOTE DATA ACQUISITION OF EMBEDDED SYSTEMS USING INTERNET TECHNOLOGIES: A ROLE-BASED GENERIC SYSTEM SPECIFICATION THÈSE N O 2388 (2001) PRÉSENTÉE AU DÉPARTEMENT D'INFORMATIQUE ÉCOLE POLYTECHNIQUE FÉDÉRALE
Biais cognitifs et recherche d information sur internet. Quelles perspectives pour les indicateurs de pertinence des moteurs de recherche
Biais cognitifs et recherche d information sur internet. Quelles perspectives pour les indicateurs de pertinence des moteurs de recherche Eric Boutin To cite this version: Eric Boutin. Biais cognitifs
PREDURA TMS PREvention DURAble des TMS Programme de suivi des entreprises exposées au risque de TMS. Journées Marcel Marchand 23 novembre 2013
PREDURA TMS PREvention DURAble des TMS Programme de suivi des entreprises exposées au risque de TMS Journées Marcel Marchand 23 novembre 2013 Constat Les TMS continuent d augmenter. Nos actions touchent
Influence des conditions et matériels de pulvérisation sur les pertes de pesticides au sol et dans l air en viticulture Languedocienne
Influence des conditions et matériels de pulvérisation sur les pertes de pesticides au sol et dans l air en viticulture Languedocienne C. Sinfort, E. Cotteux, B. Bonicelli, B. Ruelle, M. Douchin, M. Berenger,
IDENTITÉ DE L ÉTUDIANT / APPLICANT INFORMATION
vice Direction des Partenariats Internationaux Pôle Mobilités Prrogrramme de bourrses Intterrnattiionalles en Mastterr (MIEM) Intterrnattiionall Mastterr Schollarrshiip Prrogrramme Sorrbonne Parriis Ciitté
ETABLISSEMENT D ENSEIGNEMENT OU ORGANISME DE FORMATION / UNIVERSITY OR COLLEGE:
8. Tripartite internship agreement La présente convention a pour objet de définir les conditions dans lesquelles le stagiaire ci-après nommé sera accueilli dans l entreprise. This contract defines the
Le stage : lien privilégié entre formation et emploi
Le stage : lien privilégié entre formation et emploi Nicole Escourrou To cite this version: Nicole Escourrou. Le stage : lien privilégié entre formation et emploi. 11 pages. 2008. HAL Id:
Supervision et infrastructure - Accès aux applications JAVA. Document FAQ. Page: 1 / 9 Dernière mise à jour: 15/04/12 16:14
Document FAQ Supervision et infrastructure - Accès aux EXP Page: 1 / 9 Table des matières Introduction... 3 Démarrage de la console JMX...4 I.Généralités... 4 II.WebLogic... 5 III.WebSphere... 6 IV.JBoss...
vtiger CRM 5.0.2 Pack de langue Manuel d instalation version Française 0.1
vtiger CRM 5.0.2 Pack de langue Manuel d instalation version Française 0.1 Table des matières AVANT-PROPOS... 3 LICENCE... 3 RESPONSABILITE... 3 MARQUES... 3 1.PROCEDURED INSTALLATION A PARTIR DES SOURCES...
Forthcoming Database
DISS.ETH NO. 15802 Forthcoming Database A Framework Approach for Data Visualization Applications A dissertation submitted to the SWISS FEDERAL INSTITUTE OF TECHNOLOGY ZURICH for the degree of Doctor of
SCHOLARSHIP ANSTO FRENCH EMBASSY (SAFE) PROGRAM 2015-2 APPLICATION FORM
SCHOLARSHIP ANSTO FRENCH EMBASSY (SAFE) PROGRAM 2015-2 APPLICATION FORM APPLICATION FORM / FORMULAIRE DE CANDIDATURE Note: If there is insufficient space to answer a question, please attach additional
RAPID 3.34 - Prenez le contrôle sur vos données
RAPID 3.34 - Prenez le contrôle sur vos données Parmi les fonctions les plus demandées par nos utilisateurs, la navigation au clavier et la possibilité de disposer de champs supplémentaires arrivent aux
La qualité dans un service informatique (d un laboratoire), ça veut dire quoi en pratique?
La qualité dans un service informatique (d un laboratoire), ça veut dire quoi en pratique? Jean-Luc Archimbaud To cite this version: Jean-Luc Archimbaud. La qualité dans un service informatique (d un laboratoire),
Synergie du triptyque : Knowledge Management, Intelligence Economique & Business Intelligence
Synergie du triptyque : Knowledge Management, Intelligence Economique & Business Intelligence Abdelkader Baaziz To cite this version: Abdelkader Baaziz. Synergie du triptyque : Knowledge Management, Intelligence
Interactions 3D coopératives en environnements virtuels avec OpenMASK pour l exploitation d objets techniques
Interactions 3D coopératives en environnements virtuels avec OpenMASK pour l exploitation d objets techniques Thierry Duval, Christian Le Tenier To cite this version: Thierry Duval, Christian Le Tenier.
Panorama de la bancarisation en France
Panorama de la bancarisation en France Vitalie Bumacov To cite this version: Vitalie Bumacov. Panorama de la bancarisation en France. 2012. HAL Id: hal-00690495 https://hal.archives-ouvertes.fr/hal-00690495v1
