Data Science Consulting Héloïse Nonne Senior Data Scientist - Manager May 22, 2015 Big Data Analytics for connected home
Data analytics for disconnected homes Very low frequency resolution for local (household) measurements (< trimestrial) Only aggregated data (sum of individual loads) for higher frequency measurements (region, neighborhood) Data storage issues Computation power Limited knowledge at local level Limited predictive power Complex sophisticated models exist but are difficult to tune ARIMA models (AutoRegressive Integrated Moving Average) 2
Reducing electricity costs: a complete data ecosystem Regional / national scale Sun Wind Cloud cover Humidity Temperature Local / neighborhood scale Electricity demand???? Anthropologic data comfort temperature children at school activity of occupants Weekday /holiday Hour of day Anthropologic data Energy production Energy price Weather Appliances and use Heating Electricity storage Renewable energy Shutter orientation Elevators Doors / lights Network activity -> current occupation Anthropologic data Building structure (thermal mass) Historical data Actual measurement (real-time) Forecast Energy consumption patterns 3
Multiple sources of data for multiple models Volume vast amounts of data too large to store and analyse using traditional technology Velocity speed at which new data is generated speed at which data change Variety types of data (number, text, images, video) types of sources (real-time, static) Veracity accuracy of data (frequency, errors) quality of data (sampling errors, typos) 4
Data analytics on energy load + + Moving average and thresholds Outlier detection Anomaly detection Load prediction ARIMA Neural networks Recurrent neural networks Clustering: K-means, DBScan Self-organizing maps Identification of consumption patterns Statistics for reporting on dashboards Recommendations to reschedule appliances Storage of energy (photovoltaic, geothermic, etc.) 6
Many usecases Business Society Scoring and customer segmentation Predict the demand in energy Predictive maintenance (elevators, HVAC, photovoltaic,..) Cost reduction Detect precarity (underheating) Detect people in distress (illnesses, elderly, heat wave, ) Improved safety (fire detection, security, ) Research / knowledge Sustainability Building optimization (thermal mass, isolation, configuration, windows orientation) Consumption patterns Social behaviors Optimize use and storage of energy (light management, applicances use, demand reduction, ) Improve comfort in neighborhood Reduce waste (energy, water, appliances) But remain pragmatic and think about the whole picture -> predictive maintenance on light bulbs??! 7
Predictive maintenance Cost reduction and improvement of reliability through predictive maintenance Elevator maintenance predict failure before breakage Data Shaft speed Vibrations (X, Y, Z) Sound measurements Rail vibrations Motor temperature Oil buffer Wear, failure Bearing fault Door: Shoe deformation Unbalance Misalignment Resonance 8
A predictive maintenance management system Requirements Continuous adaptation of diagnostic Build, increase and maintain knowledge Handle large quantity of data Handle uncertainty in diagnostic Assess fault severity Challenges Symptoms are a mix of different causes Information is unclear Limited frequency resolution Missing data Noise Data center Remote management system Richer knowledge multiple sources 9
Bayesian networks Compact representation of entities states or events as random variables Contains knowledge about how states /events are BF DF related Bearing fault Door deformation WU Weight unbalance RN Resonance MA Misalignment AYX Vibration freq peak on axis A at Y X TP Temperature > x C SP Shaft speed freq peaks SdB Sound > x db Bayesian network Qualitative = dependence relations Quantitative = the strengths of the relations DF SP TP BF Y1X SdB WU Y2X Mix a priori knowledge with experimental (real-time) data Explanatory (human understanding of phenomena vs black-box MA Z1X models) Uncertainty management (assessment of probability of failure) Possibility to learn Parameters Structures (events, entities, causes and effects) Z2X X1X RN X2X Advantages Decision rules for action Absolute need of prior knowledge from professionals 10
The big (data) picture Many sources of data: weather, energy production, economic, social, behavioral data, appliances characteristics, current building occupation, activity, etc. Different scales: worldwide, regional, local, individual Different times: historical data, year, month, day, hour, real-time The system is not going to be perfect at once -> design it constant improvement A single model is useless: each model has its use and models feed each other with their knowledge and prediction Choose the right model and the right technology: according to usecase, time cost, energy cost, pragmatism, realism Build models with the professionals who know the problem -> build on existing knowledge An efficient system implies close collaboration business, researchers, manufacturers, maintainers, owners, users, developpers, data scientists, data managers, optimization specialists, and end-users 12
Quantmetry Spécialiste de la Data science un cabinet de conseil «pure player» du Big Data et de la Data science dont le développement commercial a démarré en 2013 des méthodes statistiques avancées, le machine learning et les technologies Big data 2014: 1,5 millions d euros de chiffres d affaire avec une forte ambition de croissance, en France et à l étranger Une vingtaine de data scientists / consultants Quantmetry accompagne ses clients sur l ensemble des strates de la pyramide des données et participe ainsi à leur transformation digitale par le quantitatif pour des résultats concrets sur leur performance business. Agir Prédire Analyser Stocker Collecter Automatiser la décision et l action Prévoir ce qui peut advenir grâce aux tendances du passé Analyser pour mieux comprendre signaux forts et faibles Tout stocker! De plus en plus de data disponibles 13
Activités de Quantmetry Conseil Accompagnement Réalisation Optimisation Business par la Data Conduite de projet Projets pilotes Détection et priorisation d opportunités par la data Construction de schéma d architecture IT Cadrage, projet d industrialisation Méthodologie (modèles statistiques et algorithmes) Technologies Big Data Proof of concept de Data science Pilotes technologiques Structuration d un Data Lab Conduite du changement Industrialisation Retours d expérience et bonnes pratiques Schéma d organisation et de gouvernance Choix d une architecture technologique Montée en compétences Recrutement Gouvernance Industrialisation de pilotes (API, ) Création d une architecture Big Data et mise en place de flux de données 14
Veillle technologique et expérimentations Création et développement de produits spécifiques autour des technologies Big Data Recherche et développement en Data science Des thèmes d investigation : Online learning Deep learning et réseaux de neurones Industrialisation Analyse sémantique Energie (analyse de séries temporelles) Smart cities Amélioration de l expérience utilisateur Acteur de l écosystème Big Data : participation à des séminaires, conférences internationales, hackathons, compétitions Kaggle, partenariats éditeurs Collaborations avec des laboratoires de recherche et des écoles. 15
Quelques Références en Data science Lift = 6 Lift = 2 Feature engineeri Données ng non structurée s Gradient Boosting Baseline (régression logistique) Amélioration du lift pour la conquête en banque des clients assurés Détection de churn pour un opérateur télécom Optimisation d un outil de pricing pour un acteur de la distribution B2B Modèles prédictifs de consommation d énergie Durée session Nb pages vues Groupe Age URL page résilitation 0 20 40 Mise en place d un Data Lab pour un assureur Analyse de comportements pour une mutuelle 16
Excellence Altruisme Résultats et Big Data www.quantmetry.com Visitez notre blog quantmetry-blog.com