ECOLE DE BIOINFORMATIQUE INITIATION AU TRAITEMENT DES DONNÉES DE GÉNOMIQUE OBTENUES PAR SÉQUENÇAGE À HAUT DÉBIT 05-10 OCTOBRE 2014 - STATION BIOLOGIQUE - ROSCOFF INTRODUCTION TO NEXT GENERATION SEQUENCING Claude Thermes Genome analysis Centre de Génétique Moléculaire Gif-sur-Yvette 06/10/2014
Step 1: sample preparation Step 2: sequencing (Illumina) Step 3: data analysis (with permission of ABIMS)
Situation in 2009 1-5 µg genomic DNA Genome sequencing 10 ng DNA 10 µg total RNA 10 µg total RNA Adapted from Science 306:636-640, 2004
Situation today 1-5 µg genomic DNA 50 ng Genome sequencing 10 ng DNA 10 µg total RNA 1-2 ng 1 µg 10 µg total RNA 1ng Adapted from Science 306:636-640, 2004
Libraries from DNA samples
DNA-seq Libraries Illumina TruSeq technology Genomic DNA Sonication Size selection Adaptors ligation PCR
DNA-seq Libraries Illumina TruSeq technology Genomic DNA Sonication Size selection? Adaptors ligation PCR
Ligate Y-adaptors PCR Primer 1: complementary to R Primer 2: equivalent to R
DNA-seq Libraries Nextera tagmentation Tagment Enzyme fragments DNA and attaches junction adapters (blue and green) to both ends of the tagmented molecule Transposomes / Tagment Enzyme Tagmentation Dual barcode approach up to 96 indexed samples rapid ( 2 hours) and requires small quan33es (50 ng)
Paired end sequencing 1 rst read 2 d read
Comparison of single read versus paired end sequencing Single read density??? Paired end density
Single read density??? Paired end density Paired end density Paired end sequencing : improves genome assembly but requires a good control of DNA fragmenta3on (purifying gels/columns) 3me consuming and requires large quan33es (1-5 µg)
BUT : Paired end fragments are too short for assembling large genomes with many repeated elements mate pair libraries
Classical Illumina mate pair library several kilobases Problems : low coverage few fragments, over- amplified
A new method : Nextera Mate Pair Tagment Enzyme fragments DNA and attaches a biotinylated junction adapter (green) to both ends of the tagmented molecule circularization Fragmentation enrichment via the biotin tag adapters ligation at both ends
A new method : Nextera Mate Pair Tagment Enzyme fragments DNA and attaches a biotinylated junction adapter (green) to both ends of the tagmented molecule circularization Fragmentation enrichment via the biotin tag rapid ( few hours) and requires small quan33es (50 ng) adapters ligation at both ends
Quelques remarques Protocole Illumina Truseq Ligations d adaptateurs Nextera Tagmentation Matériel de départ Avantages inconvénients Fragments d ADN (dble brin) Génomique ou ChIP 1-1000ng Peu sensible à qualité du matériel Très versatile, contrôle précis de la taille (purif sur gel ) Protocole préféré si on veut des tailles homogènes, ou grandes pour du paired end 2x250 Fonctionne également sans PCR si quantité de matériel suffisante (>100ng) Protocole long : 1-2 journées dimères possibles, fragmentation nécessaire ADN génomique, 50 ng (grands génomes) Très rapide (4h) Très sensible à qualité de l ADN de départ (intégrité, pureté) Difficile de contrôler la taille des inserts qui sont trop petits pour paired end 2X250 PCR obligatoire Remarques Très adaptable, on peut ajuster le nombre de cycles PCR à la quantité de matériel de départ Si petites quantités : utiliser des billes la taille des fragments de départ déterminera la taille finale des fragments Possibilité de double tag (96 index) Non miltiplexable avec Truseq (primers différents de Truseq)
Some examples of libraries prepared from DNA samples Exome sequencing Hi-C Long-range interactions Rad-seq Re-sequencing Indels, SNP, CNV DNA replication origins De novo sequencing Adapted from Science 306:636-640, 2004
Re-sequencing : identification of SNP, indels Mutations specific to forward strand
Mutations due to mono-directional sequence effect Nakamura et al. NAR (2011) Partial blockage of DNA synthesis
Dephasing due to partial blockage of DNA synthesis
Dephasing due to partial blockage of DNA synthesis
Mutations due to bi-directional sequence effect
Libraries from RNA samples
RNA-seq Libraries
Quelques remarques Tous les protocoles sont directionnels Protocole TruSeq small RNA (Illumina) ScriptSeq (Epicentre) TotalScript (Epicentre, Nextera) Matériel de départ ARN déplété ou polya 25-100 ng ARN déplété ou polya 0,5-50 ng ARN total (ou polya) 1-5ng ARN NON DEGRADÉ (tagmentation) Principe fragmentation Ligation sur ARN RT & PCR RT par random priming PCR RT par oligo dt PCR++ Avantages inconvénients Remarques Taille des fragments bien contrôlée Adapté pour paired end 2X250 Aberrations si trop petites quantités 2-3 jours de manip non automatisable Petites quantités Possible même si dégradé (FFPE) Rapide, automatisable Sensible à contamination par gdna Fragmentation non contrôlée (200-800nt) Semble donner pas mal de duplicats quand les quantités sont dans la gamme basse RNA-seq possible même si très petites quantités d ARN total L ARN doit être peu dégradé Non adapté pour paired end 2X250 Non multiplexable avec TruSeq (index Nextera)
Comparison of two RNA fragmenta3on protocols : SOLiD (Transcriptome Analysis kit) : RNase III fragmenta.on and Illumina (Direc3onal mrna- Seq kit) : Zinc fragmenta.on
SOLiD TM Whole Transcriptome Analysis Kit: RNase III fragmentation RiboMinus RNA fragmented RNA RNaseIII Hybridiza6on with adapters, liga6on 5 3 N NNNNNN Reverse transcrip6on Size selec6on PCR amplifica6on
Illumina directional mrna-seq Library: Zinc fragmentation RiboMinus RNA fragmented RNA Zinc Hybridiza6on with adapters, liga6on 5 3 N NNNNNN Reverse transcrip6on Size selec6on PCR amplifica6on
Sequencing Illumina (Zinc) and Solid (Rnase III) libraries intron YBR078W Zinc Same number of reads RNase III
Examples of libraries from RNA samples mirna-seq Identification mrna 5 ends of Ribo-seq Long non-coding RNAs Pol II FRT-seq CLIP-seq NET-seq
NET-seq : Native Elongating Transcript sequencing Churchman and Weissman, 2011 sequencing of 3 ends of nascent RNAs still associated with RNA polymerase distribution of transcribing polymerases along the genome in a strand specific manner allows studies of transcription termination Pol II Pol II Cells in desired condition Pol II Pol II Pol II RNA polymerase II immunoprecipitation Recovery of nascent transcripts Associated with the polymerase RNA-seq and mapping on the genome
FRT-seq: amplification-free, strand-specific transcriptome sequencing Mamanova et al. Nature Methods (2010) The reverse transcription reaction takes place on the flowcell No PCR amplification, so PCR biases and duplicates are avoided Because the template is poly(a) + RNA rather than cdna, the resulting sequences are necessarily strand-specific The method is compatible with paired- or single-end sequencing RT on the flowcell Cluster generation
Some problems
Libraries prepared from very small amounts of DNA or RNA (<< 1ng) ChIP- seq with very small amounts of immuno- precipitated material RNA from small amounts of 3ssue (laser dissec3on) Typical problem : accumula3on of dimers of the two adaptors adaptor dimers are amplified more rapidly than other fragments and invade the libraries they cons3tute the majority of sequenced reads rare fragments then tend to be non homogenously amplified
Sequencing of very small amounts of genome fragments (<< 1ng) 13 kb Small in put DNA 43 kb Increasing input DNA
New direc3ons with single- cell sequencing FLUIDIGM C 1 System : allows measurement of gene expression in 96 single-cells MALBAC Multiple Annealing and Looping-based Amplification Cycles Allows sequencing the genome of a unique cell (Zong C. et al. Science, 2012) Many other systems are in development : larger cell numbers, single-cell ChIP-seq, etc.