Phamaco-génétique et analyses multivariés

Documents pareils

Exemple PLS avec SAS

Forthcoming Database

Application Form/ Formulaire de demande

Institut français des sciences et technologies des transports, de l aménagement

Notice Technique / Technical Manual

HENDRICH FALL RISK MODEL (HFRM)

Approche par groupe de gènes pour les données longitudinales d expression génique avec une application dans un essai vaccinal contre le VIH

en SCÈNE RATIONAL Rational Démonstration SDP : automatisation de la chaîne de développement Samira BATAOUCHE sbataouche@fr.ibm.com

Scénarios économiques en assurance

Plan. Department of Informatics

Improving the breakdown of the Central Credit Register data by category of enterprises

L industrie pharmaceutique et la grippe aviaire

Face Recognition Performance: Man vs. Machine

The Exploration of HIV Fitness Landscapes

TABLE DES MATIERES A OBJET PROCEDURE DE CONNEXION

THÈSE. présentée à TÉLÉCOM PARISTECH. pour obtenir le grade de. DOCTEUR de TÉLÉCOM PARISTECH. Mention Informatique et Réseaux. par.

Interest Rate for Customs Purposes Regulations. Règlement sur le taux d intérêt aux fins des douanes CONSOLIDATION CODIFICATION

Université de XY University of XY. Faculté XY Faculty of XY

Projet de thèse. Intitulé de la thèse. Spécialité du doctorat. Problématique scientifique générale

Supervision et infrastructure - Accès aux applications JAVA. Document FAQ. Page: 1 / 9 Dernière mise à jour: 15/04/12 16:14

Natixis Asset Management Response to the European Commission Green Paper on shadow banking

Paxton. ins Net2 desktop reader USB

Projet SINF2275 «Data mining and decision making» Projet classification et credit scoring

APPENDIX 6 BONUS RING FORMAT

Contrôle d'accès Access control. Notice technique / Technical Manual

IPv6: from experimentation to services

SysFera. Benjamin Depardon

Thank you for choosing the Mobile Broadband USB Stick. With your USB Stick, you can access a wireless network at high speed.

XtremWeb-HEP Interconnecting jobs over DG. Virtualization over DG. Oleg Lodygensky Laboratoire de l Accélérateur Linéaire

RAPID Prenez le contrôle sur vos données

Cedric Dumoulin (C) The Java EE 7 Tutorial

Cheque Holding Policy Disclosure (Banks) Regulations. Règlement sur la communication de la politique de retenue de chèques (banques) CONSOLIDATION

Instructions Mozilla Thunderbird Page 1

Bill 69 Projet de loi 69

L UNIVERS INSTANTANÉ:

Medication management ability assessment: results from a performance based measure in older outpatients with schizophrenia.

Edna Ekhivalak Elias Commissioner of Nunavut Commissaire du Nunavut

Calculation of Interest Regulations. Règlement sur le calcul des intérêts CONSOLIDATION CODIFICATION. Current to August 4, 2015 À jour au 4 août 2015

QlikView et Google Big Query : Une réponse simple, rapide et peu coûteuse aux analyses Big Data

HSCS 6.4 : mieux appréhender la gestion du stockage en environnement VMware et service de fichiers HNAS Laurent Bartoletti Product Marketing Manager

Package Contents. System Requirements. Before You Begin

Plateforme Technologique Innovante. Innovation Center for equipment& materials

Prérequis réseau constructeurs

Les doutes et les questions des économistes face au système des brevets

WEB page builder and server for SCADA applications usable from a WEB navigator

Eléments de statistique

Analyse de grandes bases de données en santé

TRAVAUX DE RECHERCHE DANS LE

Master de Bioinformatique et Biologie des Systèmes Toulouse Responsable : Pr. Gwennaele Fichant

Préconisations pour une gouvernance efficace de la Manche. Pathways for effective governance of the English Channel

Once the installation is complete, you can delete the temporary Zip files..

La recherche clinique au cœur du progrès thérapeutique

Software and Hardware Datasheet / Fiche technique du logiciel et du matériel

Language requirement: Bilingual non-mandatory - Level 222/222. Chosen candidate will be required to undertake second language training.

Serveur d'application à la juste taille

INVESTMENT REGULATIONS R In force October 1, RÈGLEMENT SUR LES INVESTISSEMENTS R En vigueur le 1 er octobre 2001

PIB : Définition : mesure de l activité économique réalisée à l échelle d une nation sur une période donnée.

Services à la recherche: Data Management et HPC *

La Gestion des Données Cliniques

I. Programmation I. 1 Ecrire un programme en Scilab traduisant l organigramme montré ci-après (on pourra utiliser les annexes):

Caroline Hurault-Delarue 1, Cécile Chouquet 2, Nicolas Savy 2, Isabelle Lacroix 1, Christine Damase- Michel 1

Technologies quantiques & information quantique

Windows Server Chapitre 1: Découvrir Windows Server 2008

PARIS ROISSY CHARLES DE GAULLE

Calcul intensif pour la biologie

ETUDE ET IMPLÉMENTATION D UNE CACHE L2 POUR MOBICENTS JSLEE

BIG Data et R: opportunités et perspectives

HOSPEDALE DE FUTURO : QUALI PROSPETTIVE? L HOPITAL DEMAIN : QUELLES PERSPECTIVES?

Mon Service Public - Case study and Mapping to SAML/Liberty specifications. Gaël Gourmelen - France Telecom 23/04/2007

This is a preview - click here to buy the full publication NORME INTERNATIONALE INTERNATIONAL STAN DARD. Telecontrol equipment and systems

REMOTE DATA ACQUISITION OF EMBEDDED SYSTEMS USING INTERNET TECHNOLOGIES: A ROLE-BASED GENERIC SYSTEM SPECIFICATION

Quatre axes au service de la performance et des mutations Four lines serve the performance and changes

SNMP for cloud Jean Parpaillon. SNMP4cloud - 1

Grégoire de Lassence. Copyright 2006, SAS Institute Inc. All rights reserved.

The new consumables catalogue from Medisoft is now updated. Please discover this full overview of all our consumables available to you.

Visualisation et Analyse de Risque Dynamique pour la Cyber-Défense

First Nations Assessment Inspection Regulations. Règlement sur l inspection aux fins d évaluation foncière des premières nations CONSOLIDATION

CA ARCserve Family of Solutions Pricing and Licensing

Mise en place d un système de cabotage maritime au sud ouest de l Ocean Indien. 10 Septembre 2012

INSTITUT MARITIME DE PREVENTION. For improvement in health and security at work. Created in 1992 Under the aegis of State and the ENIM

Folio Case User s Guide

Gènes Diffusion - EPIC 2010

VMware ESX : Installation. Hervé Chaudret RSI - Délégation Centre Poitou-Charentes

Les mésocentres HPC àportée de clic des utilisateurs industriels

Instructions pour mettre à jour un HFFv2 v1.x.yy v2.0.00

Estimated SMB instances PC (Physical and Virtual) 125,000 Total instances: SMB 1-24 PC. 392,000 Total instances: SMB PC

SERVEUR DÉDIÉ DOCUMENTATION

Anticiper et prédire les sinistres avec une approche Big Data

.Réinventons l innovation.

Credit Note and Debit Note Information (GST/ HST) Regulations

RISK-BASED TRANSPORTATION PLANNING PRACTICE: OVERALL METIIODOLOGY AND A CASE EXAMPLE"' RESUME

Logitech Tablet Keyboard for Windows 8, Windows RT and Android 3.0+ Setup Guide Guide d installation

Public and European Business Law - Droit public et européen des affaires. Master I Law Level

INSTRUMENTS DE MESURE SOFTWARE. Management software for remote and/or local monitoring networks

Introduction au datamining

Cycle Innovation & Connaissance 12 petit déjeuner Mardi 15 mai Cloud Computing & Green IT : nuages ou éclaircies?

Revision of hen1317-5: Technical improvements

Lire ; Compter ; Tester... avec R

MODERN LANGUAGES DEPARTMENT

Transcription:

CytoPathfinder Phamaco-génétique et analyses multivariés Besoins biostatistiques d une biotech anglo-japonnaise Olivier Delrieu PolytechLille Octobre 2010 1

Pharmaco-génétique Etude de l influence du génome sur la variabilité de la réponse à un traitement médicamenteux. Des individus s éloignent de la réponse attendue: Diminution ou absence d efficacité (maladies complexes) Effets indésirables ou toxicité (patients hétérogènes) Comprendre et utiliser cette hétérogénéité à un intérêt majeur pour les patients, les régulateurs et l industrie. Prévenir la toxicité Augmenter le succès de la R&D Permettre de développer des traitements personnalisés 2

Drug Induced Adverse Reactions Drug Induced Liver Injury (DILI) 900+ drugs implicated Leading cause of drug termination & withdrawal 5% of all hospital admissions 50% of all acute liver failures Steven Johnson Syndrome (SJS) 100+ drugs implicated Rare and severe Dermis-epidermis wide spread detachment 3

Déficience des stratégies d analyses actuelles (1) Impropres à l analyse de maladies complexes et hétérogènes provenant de l interaction de nombreux gènes et facteurs environnementaux. Pas d investigation de l hétérogénéité clinique/génétique Approche un seul gène pour tous Les variables ayant un effet principal important dans la population entière sont favorisées Rend impossible de développement de médecines personnalisées Analyses univariées Correction massive pour tests multiples Nécessitent un grand nombre de patients Analyses des effets principaux uniquement Pas d analyse à grande échelle des corrélations, interactions Pas de coanalyse avec les variables environnementales ou cliniques, qui sont analysées dans un 2eme temps. 1 Drinking from the Fire Hose Statistical Issues in Genomewide Association Studies. Hunter & Kraft. NEJM 2007 4

Taxonomy3 Méthode basée sur concept de classification Bayesienne 1 Analyse pertinente des maladies complexes et hétérogènes Perform multivariate analysis of megavariates Probe heterogeneity in cohorts, detect hidden sub-phenotypes Co-analyze several data types Build powerful predictive models Analyze variables Correlations Interactions Residuals -> additive variables -> coupled variables -> synergistic variables epistasis 1: Delrieu, O. and Bowman, C. E. (LASR 2005). Visualisation of gene and pathway determinants of disease. In: S. Barber, P. D. Baxter, K. V. Mardia and R. E. Walls (Eds.), Quantitative Biology, Shape Analysis, and Wavelets. University of Leeds, 180pp. 21-24 5

Taxonomy3 Méthode S applique à des études cas/témoins comportant des variables de tout type: discrètes (ex. SNP, catégorielles) et continues (ex. RNA, cliniques) 1. Calcul de divergences de Kullback Leiber personnalisées. Transformation en une matrice de même dimension contenant le gain d information donné par chaque observation et variable quant à la distinction cas/témoin globale 2. Algèbre linéaire - Analyse en composante principale => corrélation, hétérogénéité - Modélisation : interactions, résiduels 3. Ré-échantillonnage du statut cas/control => degré de significativité des variables 6

Taxonomy3 Implémentation Démo en Java sur taxonomy.delrieu.org Montée en puissance et passage en production posent des défis statistiques et informatiques Mise en place d une collaboration USTL / PGXIS Stage puis CDI d un étudiant de Polytech: Rémi Lebret : remi.lebret@pgxis.com Intérêt double formation statistique/informatique Développement logiciel, faible coût d utilisation Utilisation facile par notre groupe et groupes académiques Rapidité, portabilité, petite empreinte mémoire Jeu de données DILI utilisé pour passage en production 7

DILI dataset - Taxonomy3 analysis PCA output reveals signals and noise 51 Flucloxacillin-induced DILI cases (red), 281 POPRES controls (blue) 904,158 SNPs (green) and clinical variables (black) rs2395029 8

Case/control resampling: significant signal on 1 st component rs2395029 HLA-B*5701 Strong signal on chromosome 6 Major Histocompatibility Region (MHC) Mainly driven by Linkage Disequilibrium (correlation) Confirms findings of univariate analysis Additional (weak) signals 9

Tax3 analysis reveals genetic heterogeneity Close-up on chromosome 6 signals of 1 st and 2 nd components Two independent sets of SNPs/genes of interest Two distinct biological processes, relevant to population subgroups HCP5, MICB, TNF, HLA-B,. TRIM39, TRIM10, BAT1, HLA-A, 10

Interaction analysis reveals additional genes Interactions of the 9648 SNPs mapped to 108 genes of interest (46,537,128 terms). On the first PCA component: Little interaction on chromosome 6, MHC region Strong interaction between MHC region and other chromosomes For example: Ch2: IL1RL1 and IL1RL2 : interleukin 1 receptor-like 11

Visualisation of interaction networks Between gene interaction strengths. Strongest interactions in red. 99 th percentile cut-off. Black boxes mark genes not on chromosome 6. Interaction analysis allows discovery of additional genes of importance Understanding betweengenes relationships (epistasis) allows exploration of new biological pathways 12

Predictive model s characteristics (main effects only) Receiver Operating Characteristics (ROC) curve True Positive Rate = 84% 16% of patients prone to develop DILI would be missed by this model False Positive Rate = 12% patients you would wrongly remove from Flucloxacillin exposure, as they would not develop DILI Gives 88% specificity Adaptable model Adjusting the decision criterion provides better TPR or FPR 13

Taxonomy3 Défis Logiciel propriétaire vs. commercial (SAS, ) ou libre (R) Statistiques ACP en grande dimension, resampling, Rapidité et volume des données Codage en C, C++ ; Multithreading ; Distributed Computing Codage bas niveau : ex. SNPs codés en binaire Data-mining et visualisation Exemple pour 10 6 SNPs et 300 individus: Analyses simples (une ACP): PC à 8 cœurs, <2GB RAM, 10 mins Analyses complexes : Cluster de machines virtuelles (Amazon Web Service Cloud) 24h avec 160 cœurs pour $0.10/core/h 14

Projets Court terme Analyses d interaction génome entier (10 12 variables) Analyses complexes Résiduels Modèle prédictif avec interactions, résiduels Analyses de pathologies plus hétérogènes et complexes (e.g. diabète) Long terme Analyse Tax3 dans études cliniques (plusieurs groupes, outcome continu) Thèse de Rémi. «Important for a more statistics-focused career» 15

Conclusion Biologie / génétique Nouvelles méthodes en constante évolution Double formation statistique/informatique Aspect éthique du métier. Le statisticien apporte sa contribution à la recherche médicale Biotech Sujet innovant Petite équipe: indépendance, télé-travail olivier.delrieu@pgxis.com 16

CytoPathfinder Backups 17

Bayes Factors, as classificatory evidence Is genome of a drug-induced SAE case the same as drug-exposed controls? Odds of H p (genome G Si of case S i is different from controls genome G C ) relative to baseline hypothesis H d (same genome as controls) : Pr( Pr( H H p d G G Si Si, G, G C C ) ) Pr( Pr( H H p d ) ) Pr( Pr( G G Si Si, G, G C C H H p d ) ) Posterior Prior Bayes Factor likelihood ratio This Bayes Factor is the amount of evidence that the i th case is classified as not having the same genome as controls. Then, assuming independence between loci: BF Nalleles j 1 ˆ ˆ S j S j ij ij.(1.(1 ˆ ) j ˆ ) j 1 S 1 S ij ij where ˆ j, ˆ j S ij are 1 if allele j present in subject i, 0 otherwise frequency estimates of allele j in cases and controls Delrieu, O. and Bowman, C. E. (LASR 2005). Visualisation of gene and pathway determinants of disease. In: S. Barber, P. D. Baxter, K. V. Mardia and R. E. Walls (Eds.), Quantitative Biology, Shape Analysis, and Wavelets. University of Leeds, 180pp. 21-24 18

Log Bayes Factors, as information gain A case/control data matrix can be transformed into a LBF matrix of the same dimension, representing the information gain provided by each subject and SNP pertaining to the overall case/control distinction: LBF ( i, j ) log Lˆ Lˆ cases controls ( j, k ) ( j, k ) where and Lˆ k is the genotype of subject the frequency estimate of k. i for SNP j, LBFs is a measure of information (unit is bit if log base 2) related to Shannon self-information (1948) LBFs are personalized (actual) Kullback Leiber s divergences LBFs can be generalized to other data types Delrieu O and Bowman C (LASR 2007). On using the correlations of divergences. In: Barber S, Baxter PD and Mardia KV (eds). Systems Biology and Statistical Bioinformatics. Univ. of Leeds pp 27-35 19

Simultaneous analysis of genetic, pathway and clinical LBFs

PGXIS developed an on-demand computer cluster tailored for tax3 analysis On demand Linux cluster Internet PGXIS Storage Server File server, storage and backup (500GB per WGS analysis) Backup Private network Workstation 1 Workstation 2 Workstation 3... Workstation 19 Full access to a maximum of 19 mid-range virtual workstations (2 to 8 processor cores, 8 to 70 GB RAM) Capacity: 0 to 160 cores Distributed architecture - Message Passing Interface - Torque Administrative web interface Cluster management: - Setup/start/stop server & workstations - Add storage, backup data - Administer firewall 21