Energy Footprint of a Computer Code July 2010 Omar Bouslama EDF R&D
Summary - Who is EDF - Energy issues for datacenters - Thinking exaflops - Power consumption - The Ecograppe project - First results - Conclusions 2
Who is EDF 3
Who is EDF - EDF Group Revenue : 64,3 Mds Net revenue : 3,4 Mds Clients : 38,1 millions Employees : 161 000 Production : 610 TWh 4
Who is EDF - EDF R&D 2 000 people 30% women 300 Phds and 200 Phds candidates 150 teaching researchers 5
Energy issues for Datacenters 6
Energy issues for Datacenters Ever increasing need for more computational power Energy consumption per server < 2000 : 50 watt * 2008 : 250 watt * 24h/24, 7days/7 14% power difference between idle and active states (Grid 5000) Computational resources are under-utilized Datacenters : 10%-15%* use of total capacity *The Problem of Power Consumption in Servers : Lauri Minas and Brad Ellison, Intel, March 2009 7
Thinking exaflops 8
Thinking exaflops Flops FLoating Point Operations Per Second Today we talk about : Giga flops -> 109 Tera flops -> 1012 Peta flops -> 1015 Exa flops -> 1018 9
Thinking exaflops TOP 500 Started in 1993 500 most powerful known computer systems in the world As reported by these machines administrators biannual ranking : June, November June 1993 : Top machine had a performance of 59.7 GFlops June 2010 : 1.75 PFlops (x 29 500) 2019 : ExaFlop machines are expected this decade 10
Thinking exaflops energy costs Energy costs (powering + cooling) 63% datacenter costs over their lifetime are the power and cooling costs The Earth Simulator Center : 1st between 2002 2004 35.86 TFlops 18 megawatts $10.000.000/year IBM RoadRunner 1.4 petaflop 2.35 megawatts 2019 : Exaflop Consumption will be 500 megawatts without fundamental changes in the way datacenters are managed or technological breakthroughs 11
Thinking exaflops energy costs Source : Estimating total power consumption by servers in the u.s. and the world, Jonathan Koomey, 2007 12
Thinking exaflops energy costs ExaFLOPS Machine without Power Mgmt Other misc. power consumption: Power supply losses Cooling etc 500+MW? Disk Comm Memory Compute 10MW 10EB disk @ 10TB/disk @10W 100MW 100pJ per Flops 150MW 200MW 0.1B/Flops @ 1.5nJ per Byte ~400W / Socket ExaFLOPS Machine Future Vision ~40MW 10MW Other misc. power consumption: Power supply losses Cooling etc SSD ~2MW 10EB SSD @ 5TB/SSD Comm ~9MW 9pJ per Flops Memory 9MW 0.1B/Flops @ 150pJ per Byte Compute 10MW 50K Sockets @~200W each Source: HPC: Energy Efficient Computing, Steve Pawlowski, Intel 13
Thinking exaflops Thinking green Green 500 November 2007 Performance = f ( speed ) -> Performance = f ( speed, power consumption) Flops / watt 14
Power consumption 15
Power consumption measuring Σ component power consumption Provide internal power measurement equipment Separate DC lines Modelling Estimation Complexity Sensors Overall consumption Real consumption 16
Power consumption components Processor 85 W Memory 15 W Disk Several disks Intel labs, 2008 Power Supply efficiency at a very high load factor : 80 to 90 percent * Actually servers today run with 20-40 percent efficient power supplies * *The Problem of Power Consumption in Servers : Lauri Minas and Brad Ellison, Intel, March 2009 17
Power consumption energy saving Ultra low voltage cpu Low consumption Worse performance Switch off servers Consumption peak at power on/off Low power modes Dynamic Voltage Frequency Scaling Massive Arrays of Redundant Disks Local decisions Transition delay Shut down/boot 18
Power consumption software consumption Software power consumption JouleMeter : Microsoft research Intel Energy Checker Our approach 19
The Ecograppe Project 20
The Ecograppe Project Started in 2009 3 partners 21
The Ecograppe Project : Kerrighed Started in 1999 Single system image Process migration Check pointing 22
The Ecograppe Project : Électricité de France Provide study cases of real usage experimental platform to better understand the energy consumption process Test algorithms in its environment 23
The Ecograppe Project : EDF test platform - Hpslab High Performance Simulation Lab Graphical cluster for the CARRIOCAS project 64 graphical node 2 frontal nodes Networking equipements POP 40 Gb/s Alcatel Switch/router Ethernet Extreme Networks Switch Infiniband «2X» Voltaire Storage cluster : LUSTRE file system 2 HP racks ( 20 To) 12 OSS servers + 1 MDS server Compute cluster 24 nœuds de calcul 1 frontal GPU Fermi for accelerated double precision co-processing 1 station with 2 cards 24
The Ecograppe Project : Measurement infrastructure APT France Temperature Sensors: Sensor IP8 28 temperature sensors [-55 C, 125 C] ±0.5 C Power Sensors: Raritan dominion PX 1 sensor/outlet 25
The Ecograppe Project : Back-end Linux Daemon Python Start - Stop - Status - Generate XML Daemon configuration SNMP Global supervision Process supervision Perf events Information about processes No code instrumentation! MySQL Data storage SOAP Server communications 26
The Ecograppe Project : Perf events Performance counters are special hardware registers available on most modern CPUs. These registers count the number of certain types of hw events: such as instructions executed, cache-misses suffered, or branches mispredicted without slowing down the kernel or applications. These registers can also trigger interrupts when a threshold number of events have passed and can thus be used to profile the code that runs on that CPU. Performance Counters on Linux : The New Tools, Arnaldo Carvalho de Melo, Linux Plumbers Conference, September 2009 27
The Ecograppe Project : Perf events Kernel >= 2.6.31 Debugfs partition mount -t debugfs none /sys/kernel/debug tools/perf Git like organisation (commands, sub commands) Main commands : List Stat Record Report 28
The Ecograppe Project : Front-end Web interface CSS Freecsstemplates.org PHP Extract data from database Flot Interactive graphs Ajax Dynamic update Real time supervision Last hour Current day History Day Month Comparisons 29
The Ecograppe Project : Front-end Web interface 30
The Ecograppe Project : Benchmarks used Code_Saturne EDF R&D 1997 Test case of thermal fatigue (Testcase Father ) 1 million cells 31
First results 32
First Results 3 architectures 6 machines AMD Opteron 2200 6 machines Intel Xeon 5160 6 machines AMD Opteron 8220 Cluster 1 Cluster 2 Cluster 3 33
First Results Opteron 8220 Opteron 2220 Xeon 5160 34
First Results Opteron 8220 Opteron 2220 Xeon 5160 35
First Results 36
Summary Electricity used by servers has doubled between 2000 and 2005 ExaFlops systems are expected in the next few years Without fundamental changes in datacenter management or technology exaflops systems are expected to consume 500MW each in 2019 Better power management in datacenter is a means of achieving significant gains in energy consumption Better power management necessarily passes through a better understanding of the power consumption of processes in the datacenters I presented the experimental platform that EDF has built to better understand the energy consumption process 37
Bibliography Performance Counters on Linux : The New Tools, Arnaldo Carvalho de Melo, Linux Plumbers Conference, September, 2009 Ecograppe : State of art of power saving in cluster + results from EDF case study, 2010 Global Climate Warming? Yes In The Machine Room, WU Feng, Virginia Tech, 2006 The Problem of Power Consumption in Servers : Lauri Minas and Brad Ellison, Intel, March 2009 HPC: Energy Efficient Computing, Steve Pawlowski, Intel, 2009 Estimating total power consumption by servers in the u.s. and the world, Jonathan Koomey, 2007 Exascale roadmap 1.0, May 2010 38