SÉQUENÇAGE DE TYPE RAD-SEQ, PRÉSENTATION ET TRAITEMENT ANALYTIQUE

Documents pareils

OBJECTIFS. Une démarche E-science

E-BIOGENOUEST, VERS UN ENVIRONNEMENT VIRTUEL DE RECHERCHE (VRE) ORIENTÉ SCIENCES DE LA VIE? Intervenant(s) : Yvan Le Bras, Olivier Collin

e-biogenouest CNRS UMR 6074 IRISA-INRIA / Plateforme de Bioinformatique GenOuest yvan.le_bras@irisa.fr Programme fédérateur Biogenouest co-financé

UTILISATION DE LA PLATEFORME WEB D ANALYSE DE DONNÉES GALAXY

XtremWeb-HEP Interconnecting jobs over DG. Virtualization over DG. Oleg Lodygensky Laboratoire de l Accélérateur Linéaire

Data issues in species monitoring: where are the traps?

Galaxy Training days. Liste des sessions disponibles : Les formateurs :

GUGGO 4 ème rencontre

Quatre axes au service de la performance et des mutations Four lines serve the performance and changes

Instructions Mozilla Thunderbird Page 1

Forthcoming Database

Présentation par François Keller Fondateur et président de l Institut suisse de brainworking et M. Enga Luye, CEO Belair Biotech

THÈSE. présentée à TÉLÉCOM PARISTECH. pour obtenir le grade de. DOCTEUR de TÉLÉCOM PARISTECH. Mention Informatique et Réseaux. par.

iqtool - Outil e-learning innovateur pour enseigner la Gestion de Qualité au niveau BAC+2

Editing and managing Systems engineering processes at Snecma

Application Form/ Formulaire de demande

Exemple PLS avec SAS

Bourses d excellence pour les masters orientés vers la recherche

Frequently Asked Questions

Mise en place d un système de cabotage maritime au sud ouest de l Ocean Indien. 10 Septembre 2012

Tutoriel Cloud IFB - Initiation -

Once the installation is complete, you can delete the temporary Zip files..

We Generate. You Lead.

Utiliser une WebCam. Micro-ordinateurs, informations, idées, trucs et astuces

SysFera. Benjamin Depardon

Agile&:&de&quoi&s agit0il&?&

Acce s aux applications informatiques Supply Chain Fournisseurs

Tier 1 / Tier 2 relations: Are the roles changing?

Stratégie DataCenters Société Générale Enjeux, objectifs et rôle d un partenaire comme Data4

INSTITUT MARITIME DE PREVENTION. For improvement in health and security at work. Created in 1992 Under the aegis of State and the ENIM

Gestion des prestations Volontaire

The new consumables catalogue from Medisoft is now updated. Please discover this full overview of all our consumables available to you.

e-science : perspectives et opportunités pour de nouvelles pratiques de la recherche en informatique et mathématiques appliquées

RÉSUMÉ DE THÈSE. L implantation des systèmes d'information (SI) organisationnels demeure une tâche difficile

La Poste choisit l'erp Open Source Compiere

WEB page builder and server for SCADA applications usable from a WEB navigator

setting the scene: 11dec 14 perspectives on global data and computing e-infrastructure challenges mark asch MENESR/DGRI/SSRI - France

Formation continue BNF // Programme des cours 2015

Services à la recherche: Data Management et HPC *

Contrôle d'accès Access control. Notice technique / Technical Manual

SCC / QUANTUM Kickoff 2015 Data Protection Best Practices

8. Cours virtuel Enjeux nordiques / Online Class Northern Issues Formulaire de demande de bourse / Fellowship Application Form

L OBSERVATOIRE DE LA BIOLOGIE DE SYNTHESE SYNTHETIC BIOLOGY OBSERVATORY

EN UNE PAGE PLAN STRATÉGIQUE

Mon Service Public - Case study and Mapping to SAML/Liberty specifications. Gaël Gourmelen - France Telecom 23/04/2007

lundi 3 août 2009 Choose your language What is Document Connection for Mac? Communautés Numériques L informatique à la portée du Grand Public

SERVEUR DÉDIÉ DOCUMENTATION

Plateforme Technologique Innovante. Innovation Center for equipment& materials

How to Login to Career Page

Face Recognition Performance: Man vs. Machine

HSCS 6.4 : mieux appréhender la gestion du stockage en environnement VMware et service de fichiers HNAS Laurent Bartoletti Product Marketing Manager

Logiciel Libre & qualité. Présentation

Le risque humain en entreprise Le cadre du renseignement

Kick Off SCC EMC l offre EXTREMIO. fmarti@fr.scc.com Philippe.rolland@emc.com. Vers de nouveaux horizons

Big data et sciences du Vivant L'exemple du séquençage haut débit

Comprendre l impact de l utilisation des réseaux sociaux en entreprise SYNTHESE DES RESULTATS : EUROPE ET FRANCE

Qualité et ERP CLOUD & SECURITY (HACKING) Alireza MOKHTARI. 9/12/2014 Cloud & Security

ADHEFILM : tronçonnage. ADHEFILM : cutting off. ADHECAL : fabrication. ADHECAL : manufacturing.

Towards a Life Sciences Virtual Research Environment

Lean approach on production lines Oct 9, 2014

DOCUMENTATION - FRANCAIS... 2

ANGULAR JS AVEC GDE GOOGLE

Le nouveau visage de la Dataviz dans MicroStrategy 10

La gestion des risques IT et l audit

Les marchés Security La méthode The markets The approach

SparkInData. Place de Marché des applications Spatiales

Tammy: Something exceptional happened today. I met somebody legendary. Tex: Qui as-tu rencontré? Tex: Who did you meet?

JSIam Introduction talk. Philippe Gradt. Grenoble, March 6th 2015

ERA-Net Call Smart Cities. CREM, Martigny, 4 décembre 2014 Andreas Eckmanns, Responsable de la recherche, Office Fédéral de l énergie OFEN

NOM ENTREPRISE. Document : Plan Qualité Spécifique du Projet / Project Specific Quality Plan

PLM 2.0 : Mise à niveau et introduction à l'offre version 6 de Dassault systèmes

EMME : un environnement de gestion des métadonnées expérimentales

RAPID Prenez le contrôle sur vos données

Extension fonctionnelle d un CRM. CRM étendu >> Conférence-débat 15 April Club Management des Systèmes d Information de l'iae de Paris Alumni

Projet de réorganisation des activités de T-Systems France

IPv6: from experimentation to services

The impacts of m-payment on financial services Novembre 2011

LES APPROCHES CONCRÈTES POUR LE DÉPLOIEMENT D INFRASTRUCTURES CLOUD AVEC HDS & VMWARE

Plan. Department of Informatics

Notice Technique / Technical Manual

POSITION DESCRIPTION DESCRIPTION DE TRAVAIL

Guide d'installation rapide TFM-560X YO.13

CONTEC CO., LTD. Novembre 2010

Sub-Saharan African G-WADI

Le projet WIKIWATER The WIKIWATER project

PAR RINOX INC BY RINOX INC PROGRAMME D INSTALLATEUR INSTALLER PROGRAM

La solution idéale de personnalisation interactive sur internet

Règles et paramètres d'exploitation de Caparmor 2 au 11/12/2009. Pôle de Calcul Intensif pour la mer, 11 Decembre 2009

VMware : De la Virtualisation. au Cloud Computing

PRESENTATION. CRM Paris - 19/21 rue Hélène Boucher - ZA Chartres Est - Jardins d'entreprises GELLAINVILLE

Préconisations pour une gouvernance efficace de la Manche. Pathways for effective governance of the English Channel

Language requirement: Bilingual non-mandatory - Level 222/222. Chosen candidate will be required to undertake second language training.

Instructions pour mettre à jour un HFFv2 v1.x.yy v2.0.00

F-7a-v3 1 / Bourses de mobilité / Mobility Fellowships Formulaire de demande de bourse / Fellowship Application Form

La stratégie Cloud de Microsoft

Service management. Transforming the IT organization and driving it across the enterprise. Carlo Purassanta. Integrated Technology Services Executive

Discours du Ministre Tassarajen Pillay Chedumbrum. Ministre des Technologies de l'information et de la Communication (TIC) Worshop on Dot.

Package Contents. System Requirements. Before You Begin

Principe de TrueCrypt. Créer un volume pour TrueCrypt

Galaxy est une plateforme de traitements (bio)informatiques accessible depuis l'url : (en précisant votre login et mot de passe LDAP «genotoul»).

Transcription:

SÉQUENÇAGE DE TYPE RAD-SEQ, PRÉSENTATION ET TRAITEMENT ANALYTIQUE Yvan Le Bras 1, Anthony Bretaudeau 1,2, Cyril Monjeaud 1 1 Plateforme GenOuest & CNRS UMR 6074 IRISA-INRIA, Rennes 2 INRA, UMR IGEPP & Plateforme BIPAA, Rennes

Agenda Context Biogenouest network and western France Life sciences and environment From e-biogenouest project to the CeSGO e-science center Bridging data, metadata and computation RAD-seq Generalities STACKS

Biogenouest Biogenouest is a network bringing together technological core facilities dedicated to Life and Environmental Sciences in the West of France

Biogenouest Created in 2002, Biogenouest coordinates 31 technological core facilities based in the regions of Brittany and Pays de la Loire, with the aim to organize and pool interregional resources. Biogenouest also federates 70 research units involved in thematic research covering 4 areas of activity : Marine resources, Agri-food, Health and Bioinformatics.

GenOuest : Bioinformatics core facility Member of the Biogenouest network Member of the IFB : French Bioinformatics Institute National recognition : IBiSA platform Regional strategic facility for INRA (National Institute of Agronomical Research) ISO9001:2008 certified Established since 2002 10 to 12 people Computing infrastructure, storage, software development, expertise, R&D projects

Context Now : Genomics : Next Generation Sequencing Now : Proteomics Next : Bio-imaging Kahn. On the future of genomic data. Science (2011) vol. 331 (6018) pp. 728-9 Digital data Huge amount Heterogenous Critical situation for some laboratories

LE PROJET E-BIOGENOUEST Un environnement virtuel de recherche en sciences de la vie

E-BIOGENOUEST Programme fédérateur Biogenouest co-financé par les Régions Bretagne et Pays de la Loire 24 mois Lancé depuis Mai 2012 Porteur : Olivier Collin (IRISA) Animateur : Yvan Le Bras (IRISA) Tester une approche e-science en Sciences de la vie Proposer une structuration e-science dans le Grand Ouest

Briser les silos Défis : Chaque domaine est particulier mais digital! Solution : Se centrer sur la donnée! Big Data relatif, collaboration, interdisciplinarité, mutualisation, solutions open source

Optimiser / Standardiser Kahn. On the future of genomic data. Science (2011) vol. 331 (6018) pp. 728-9 Code, algorithmes, calcul, stockage, réseau, bonnes pratiques Prévoir interopérabilité, langage commun, vocabulaire contrôlé, ontologies

People paradox Bioinformatique Séquençage Using Clouds for Metagenomics: A Case Study Wilkening et al. IEEE cluster 2009 Défis : Manque RH, Evolution des usages Solutions : mutualisation, science citoyenne, Environnement virtuel de recherche

Solution : e-science TIC Domaine scientifique Une démarche e-science Brest Roscoff Rennes Un environnement virtuel de recherche Nantes Angers

LE VRE Un environnement virtuel de recherche en sciences de la vie

HUBzero : Scientifique collaborative platform ebgo HUB HUBzero to share knowledge and manage groups and projects Informations 264 users 129 projects 56 groups 803 resources Purdue University M. McLennan, R. Kennell. Comput Sci Eng, 12:48-53, 2010.

ISAtools : Experimental data management EMME ISAtools suite to store data & metadata Fonctionalities -based on biomed ontologies -bridge between existing biomed standards -format publication submission -Pydio to upload data -biological investigation repository (data + metadata) Oxford eresearch Centre P. Rocca-Serra et al. Bioinformatics, 26;254(6), 2010

Galaxy : Data analysis web platform GALAXY by GenOuest To analyse & share data as processes and tools Informations 47140 jobs 125 users More than 800 outils Share - data - histories - workflows - tools Penn state university J. Goecks, A. Nekrutenko, J. Taylor, et al. Genome Biol, 25;11(8):R86, 2010

I have a dream Pour les scientifiques en sciences de la vie et environnement Optimiser son temps (programmation vs compréhension) Préserver (données et processus analytiques) Accéder, partager et visualiser de n importe où Une aide à la gestion de projet Pour les développeurs Améliorer l utilisation des algo et outils : Bioinfo Recherche Accélérer la mise en production Service Pour la gestion des infrastructures Optimiser l utilisation (stockage, calcul et réseaux ) Infrastructure pour la donnée infastructure de donnée

e-science focus NGS technology Bioinformatics research: Optimize NGS raw data treatment The Colib read project Life sciences research: RAD-sequencing Virtualization Academic cloud IFB / GenOuest Openstack / OpenNebula Application & dependencies packaging: Docker Innovative methods in science Citizen science approaches

LE RADSEQ Voir la présentation de Karim Gharbi du 30/01/2014 edinburgh genomics / University of Edinburgh

Buts

L arrivée du NGS Allozymes, RAPD, AFLP, Microsatellites, Puces SNPs NGS Baisse du coût de séquençage pas de l analyse ;) Le séquençage de beaucoup d individus + bcp de marqueurs reste onéreux Besoins de sous-échantillonner le génome

Réduction de génome -Fragmentation de l ADN (étape très couteuse!) -Sélection des fragments par taille -Ligation d adaptateurs pour le séquençage Utilisé au Roslin institute pour faire puce SNP

Réduction de génome -Publi initiale : Eric Johnsson, 2008 -Article d Hohenlohe dans les 5 premiers

Application

Single-end RAD

Single-end RAD

Single-end RAD Hohenlohe et al., PLoS Genetics 2010 Coupure 5 -> 3 Brin complémentaire

Paired-end RAD

Paired-end RAD

Paired-end RAD

Single vs Paired-end RAD

ddrad

ddrad

Paired-end ddrad

RAD vs ddrad

RAD vs ddrad RAD simple : séquençage entre site de restriction et cassure par sonication ddrad : séquençage entre 2 sites de restriction donc plus de flexibilité sur les fragments générés, répartitions des lectures En ddrad, le tout adaptateur + barcode + site de restriction + ADN + adaptateur ~500pb

Ne rester pas en RAD, ni à sec.

Biais Pb analyse image quand même nucléotide dans toute les séquences à la même position. Surtout sur HiSeq, moins sur MiSeq Mieux vaut utiliser une combinaison de barcodes différents. Reste pb du site de restriction! Il vaut mieux alors également mélanger les expériences!

Informations diverses 250 ng ADN nécessaire, 1 µg demandé à Edinburgh genomics Il faut une qualité au top! Pas d ADN dégradé. Cela peut induire de ne pas utiliser 30 à 40% des données PCR 12 à 14 cycles Profondeur différente en fonction des tailles de fragments! Peut poser pb surtout si mutation au niveau d un site de restriction chez certains individus Licence / Brevet QiaGen? Voir Davey et al, Molecular Ecology (2012)

Demystifying the RAD fad Johnatan B. Puritz Marine Genomics Laboratory, Texas A&M University-Corpus Christi Mikhail V. Matz Jesse N. Weber Daniel I. Bolnick Department of Integrative Biology, University of Texas at Austin Robert J. Toonen Hawai i Institute of Marine Biology, University of Hawai i Christopher E. Bird Department of Life Sciences, Texas A&M University-Corpus Christi

Recent novel approaches for pop. genomics data analysis (Andrews & Luikart, 2014) RAD sequencing = powerful & useful approach in mol. Ecology Several different published methods None = best option in all situation

Four different RAD protocols Original RAD (mbrad Miller et al. 2007 and Baird et al. 2008) Genomic DNA digestion by 1 restriction enzyme (low frequency cutter) Ligation of barcode containing adapters onto digested 5 ends Ligated genomic DNA sonication Ligation of a 3 adapter to the sonicated end Pool of the samples Size-selection of the library RAD fragments PCR enrichment

Four different RAD protocols Double digest RAD protocol (Peterson et al. 2012) Genomic DNA digestion by 2 restriction enzymes (low + high frequency cutter) Ligation of barcode P1 adapters (matching the first restriction site) and P2 adapters (matching second restriction site) Pool of the samples Size-selection of the library RAD fragments PCR enrichment + second barcode introduction to increase multiplexing potential Extremely similar to GBS (Poland et al. 2013) Pros & cons associated with ddrad also relevant to RESTseq (Stolle & Moritz 2013)

Four different RAD protocols ezrad protocol (Toonen et al. 2013) Genomic DNA digestion by 2 restriction enzymes (high frequency cutter on the same cut site) Commercially available Illumina TruSeq library preparation kit DNA end reparation Ligation of single or dual indexing adapters onto genomic fragments Pool of samples Size selection of the library RAD fragments PCR enrichment, or not, depending on the Illumina kit

Four different RAD protocols 2bRAD protocol (Wang et al. 2012) Genomic DNA digestion by 1 restriction enzyme (36-bp fragments excision recognition site + adjacent 5 & 3 base pairs) Ligation of dual barcode adapters Agarose gel target band excision after PCR enrichment No intermediate purification stages No size-selection

Potential biases when conducting RADseq PCR artefacts Restriction fragment size bias Heterozygous restriction sites root cause of allele dropout (ADO) 1 allele not detected from 1 heterozygous locus Fix: Filter any loci that are not represented in all genotyped individuals Strand bias Different genotypes from forward & reverse reads Fix: Filter any loci in this case only possible in 2bRAD

mbrad advantages Random shearing of the 3 end helps to identify putative PCR duplicates If identical starting position of the paired-end read: duplicate Random shearing improves the distribution of coverage Random shearing + larger insert size ranges: de novo assembled RAD loci are of greater length Critical for identifying function & Gene ontology Coverage and quality are fundamental!!! Distinguishing true SNP from sequencing error: if coverage is low, your statistical test will no yield significant results!

mbrad disadvantages The most technically challenging and complex protocol! Requires non standard lab equipment: sonicator Restriction fragment length bias (due to the shearing) Sequencing at different depth

ddrad advantages Greatest degree of customization Depending on the enzymes chosen & range of fragment sizes selected Allow to have hudreds of SNPs per individual at very low cost or thousands for QTL mapping experiments at moderate cost Examine histograms of digested samples early Identify / exclude excessively frequent fragments (i.e. transposons)

ddrad disadvantages Using fragment size selection to tune the quantity of loci can lead to variable representation of some loci This can be minimized using precise selection tool (i.e. Pippin Prep) ddrad particularly susceptible to ADO (Arnold et al. 2013) To be considered when performing sensitive population genetic analyses ddrad requires the highest quality genomic DNA of all RAD methods Proper fragment ligation relies on completely intact 5 & 3 overhangs!

ezrad advantages Illumina TruSeq kit Extensive manual, customer support & guarantee Probably the simplest path to obtain RAD data for small lab without experience / equipment / resources to develop in-house RAD capability Combined with an Illumina PCR-Free TruSeq kit, ezrad is the only RAD protocol that can bypass all potential PCR bias

ezrad disadvantages Illumina TruSeq kit Simplicity & uniformity but expensive However can be used with ½ & 1/3 reaction volumes All ezrad reads start with the same four GATC bases The first 4-5 nucleotides of Read 1are used to discriminate among adjacent clusters If always the same 4 first bases, difficulty to discriminate to different samples

2bRAD advantages Extreme protocol simplicity & cost-efficiency No intermediate purification stages No need for special instrumentation (only PCR + standard agarose gel) Lack of biases due to fragment size selection All endocnuclease recognition sites can be sampled

2bRAD disadvantages Difficulties to map 36 bp tags in a unambiguously way But works well in no or moderately duplicated genomes (i.e. Wang et al 2012 on Arabidopsis) 2bRAD fragments cannot be used to build genome contigs 2bRAD fragments are less likely to be cross-mappable across large genetic distances, such as across different species

Conclusions Most important considerations when selecting a particular RAD protocol are The facilities & the molecular experience of the researcher applying the approach The biology of the organisms The hypotheses being tested All RAD protocols are powerful tools for SNP discovery & genotyping of nonmodel species It is important to learn about pitfalls inherent to each method & how to adress them

Main Bioinformatics pipelines STACKS Website: http://catchenlab.life.illinois.edu/stacks/ mbrad, ddrad, ezrad & 2bRAD? STACKS does not handle INDELS, so any loci near an INDEL is lost STACKS does not call SNPs from paired end reads natively, and does especially poorly with paired end fragments that are not of a random length (e.g., ddrad and ezrad) ddocent Website: https://ddocent.wordpress.com/ddocent-pipeline-user-guide/ ddrad & ezrad PyRAD Website: http://dereneaton.com/software/pyrad/ mbrad, ddrad, PE-ddRAD, GBS, PE-GBS, EzRAD, PE-EzRAD, 2B-RAD use of an alignment-clustering method (vsearch) 2bRAD (Wang et al 2012) de novo: https://github.com/z0on/2brad_denovo With reference genome: https://github.com/z0on/2brad_gatk 2bRAD

STACKS

STACKS

LE RADSEQ SOUS GALAXY Utilisation du pipeline Stacks

Programme Détection de SNP Etude des parents d une famille Cartographie Génétique Etude d une famille avec 93 descendants Construction de mini-contig Données pairées Génomique des population Sans génome de référence Avec génome de référence

Buts Se familiariser avec l utilisation de données NGS à partir de librairies réduites (RRL) S initier à l utilisation de Galaxy du pipeline STACKS Apprendre Préparation de données brutes Illumina RAD Alignement de lectures sur un génome de référence Assembler des loci RAD Détecter des SNP, déterminer génotypes et haplotypes Calculer des statistiques en génétique des populations

Jeux de données et outils Ceux proposés par Julian dans ces formations Jeux de données épinoche de l'article d'hohenlohe et al. 2010 Nettoyage et l'analyse des données via Galaxy, le pipeline STACKS et BWA. Les jeux de données seront tous produits via des séquenceurs de type Illumina GAII ou HiSeq2000. Les logiciels sont tous open source

Merci de votre attention La plate-forme Bio-informatique GenOuest Le groupe Symbiose IRISA/INRIA GenOuest-Dyliss-Genscale Cyril Monjeaud ebgo HUB (collaboration) http://www.e-biogenouest.org/ Olivier Collin EMME portal (data management ) http://emme.genouest.org/ Galaxy instance (data analysis) GO4Bioinformatics (education) http://galaxy.genouest.org/ http://go4bioinformatics.genouest.org/