Yann Ponty, CNRS researcher@LIX, Ecole Polytechnique

2011-2012-CASM-M2 BIBS

J'interviens au sein du Master Bioinformatique et Biostatistiques (BIBS) de l'Université Paris-Sud/11 et co-habilité par l'École Polytechnique, dans le cadre de la matière Combinatoire, Algorithmes, Séquences et Modélisation (CASM). Les cours des 9 et 16 Janvier 2012 auront lieu dans la salle E203 du bâtiment 640, et consisteront en présentation de différentes approches issues de la programmation dynamique développées dans le cadre du repliement de l'ARN. Chaque séance sera suivie d'un mini-TP, où on s'attachera à mettre en pratique les principes présentés en cours.

Vous trouverez sur cette page :

Les modalités de contrôle continu
Les articles proposés pour l'examen de contrôle continu
Les slides et énoncés de TP

Modalités de contrôle continu

Le contrôle continu de CASM comporte :

Examen écrit : Le Lundi 6 Février 2012 au PUIO, portant sur l'ensemble du programme, et comptant pour 2/3 de la note finale. On pourra s'entraîner sur les examens des années précédentes :
- Examen 2011-2012 : Corrigé partiel.
- Annales 2010-2011 : Corrigé partiel.
- Annales 2009-2010 : Sujet [pdf] + Corrigé partiel [pdf])
Examen d'article : Soutenance organisée collectivement le Mardi 14 Février 2012 au PUIO et comptant pour 1/3 de la note finale.
Déroulement : Choisir individuellement un article, dont vous devrez effectuer une étude synthétique. Un court rapport (+/- 5 pages) devra nous être remis au moins une semaine avant la soutenance individuelle (15 min. chacun).

Liste des articles proposés

La lecture d'article sera effectuée et soutenue en monome. Chaque article ne pourra être choisi que par une personne au plus. Veuillez me contacter dès que vous avez choisi un article, que je puisse le retirer de la liste des articles disponibles.

[Disponible]
Fast and accurate short read alignment with Burrows–Wheeler transform [PDF]
Heng Li and Richard Durbin
Utilisation d'un transformation classique en algorithmique du texte, pour permettre un positionnement plus robuste des séquences obtenues par les technologies haut-débit. [More...] [Less...]

Abstract
Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals.
Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows-Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is 10-20x faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. [Less...]
[Pris-Sorokina]
Simultaneous alignment of short reads against multiple genomes [PDF]
Korbinian Schneeberger, Jörg Hagmann, Stephan Ossowski, Norman Warthmann, Sandra Gesing, Oliver Kohlbacher and Detlef Weigel
Approche issue de l'algorithmique des graphes, permettant de placer des petites séquences, obtenues par séquençage haut-débit, dans un ensemble de génomes de référence. [More...] [Less...]

Abstract Genome resequencing with short reads generally relies on alignments against a single reference. GenomeMapper supports simultaneous mapping of short reads against multiple genomes by integrating related genomes (e.g., individuals of the same species) into a single graph structure. It constitutes the first approach for handling multiple references and introduces representations for alignments against complex structures. Demonstrated benefits include access to polymorphisms that cannot be identified by alignments against the reference alone. [Less...]
[Pris-Rausch]
The MC-Fold and MC-Sym pipeline infers RNA structure from sequence data [PDF]
Marc Parisien and François Major
Article historique incluant les paires de bases non-canoniques dans le repliement in silico de l'ARN, couplant un algorithme de prédiction de contacts avec un algorithme de satisfaction de contraintes pour une prédiction de la structure 3D à partir de la séquence ! [More...] [Less...]

Abstract The classical RNA secondary structure model considers A-U and G-C Watson–Crick as well as G-U wobble base pairs. Here we substitute it for a new one, in which sets of nucleotide cyclic motifs define RNA structures. This model allows us to unify all base pairing energetic contributions in an effective scoring function to tackle the problem of RNA folding. We show how pipelining two computer algorithms based on nucleotide cyclic motifs, MC-Fold and MC-Sym, reproduces a series of experimentally determined RNA three-dimensional structures from the sequence. This demonstrates how crucial the consideration of all base-pairing interactions is in filling the gap between sequence and structure. We use the pipeline to define rules of precursor microRNA folding in double helices, despite the presence of a number of presumed mismatches and bulges, and to propose a new model of the human immunodeficiency virus-1 -1 frame-shifting element. [Less...]
[Pris-Nguyen]
Disease-Associated Mutations That Alter the RNA Structural Ensemble [PDF]
Matthew Halvorsen, Joshua S. Martin, Sam Broadaway and Alain Laederach
Méthode novatrice, basée sur l'échantillonnage statistique vu en cours, permettant de détecter des mutations déléteres au niveau de la structure des ARN fonctionnels, offrant de nouvelles explications à des pathologies jusqu'alors privées d'explications fonctionnelles, et permettant leur détection dans les génomes. [More...] [Less...]

Abstract Genome-wide association studies (GWAS) often identify disease-associated mutations in intergenic and non-coding regionsof the genome. Given the high percentage of the human genome that is transcribed, we postulate that for some observedassociations the disease phenotype is caused by a structural rearrangement in a regulatory region of the RNA transcript. Toidentify such mutations, we have performed a genome-wide analysis of all known disease-associated Single NucleotidePolymorphisms (SNPs) from the Human Gene Mutation Database (HGMD) that map to the untranslated regions (UTRs) of agene. Rather than using minimum free energy approaches (e.g. mFold), we use a partition function calculation that takesinto consideration the ensemble of possible RNA conformations for a given sequence. We identified in the human genomedisease-associated SNPs that significantly alter the global conformation of the UTR to which they map. For six disease-states(Hyperferritinemia Cataract Syndrome, b-Thalassemia, Cartilage-Hair Hypoplasia, Retinoblastoma, Chronic ObstructivePulmonary Disease (COPD), and Hypertension), we identified multiple SNPs in UTRs that alter the mRNA structural ensembleof the associated genes. Using a Boltzmann sampling procedure for sub-optimal RNA structures, we are able to characterizeand visualize the nature of the conformational changes induced by the disease-associated mutations in the structuralensemble. We observe in several cases (specifically the 59 UTRs of FTL and RB1) SNP–induced conformational changesanalogous to those observed in bacterial regulatory Riboswitches when specific ligands bind. We propose that the UTR andSNP combinations we identify constitute a ''RiboSNitch'' that is a regulatory RNA in which a specific SNP has a structuralconsequence that results in a disease phenotype. Our SNPfold algorithm can help identify RiboSNitches by leveragingGWAS data and an analysis of the mRNA structural ensemble. [Less...]
[Pris-Peres]
Mutational analysis in RNAs: comparing programs for RNA deleterious mutation prediction [PDF]
Danny Barash and Alexander Churkin
Présentation et analyse des méthodes permettant la prédiction in silico des mutations déléteres dans l'ARN. [More...] [Less...]

Abstract Programs for RNA mutational analysis that are structure-based and rely on secondary structure prediction have been developed and expanded in the past several years. They can be used for a variety of purposes, such as in suggesting point mutations that will alter RNA virus replication or translation initiation, investigating the effect of deleterious and compensatory mutations in allosteric ribozymes and riboswitches, computing an optimal path of mutations to get from one ribozyme fold to another, or analyzing regulatory RNA sequences by their mutational profile. This review describes three different freeware programs (RNAMute, RDMAS and RNAmutants) that have been developed for such purposes. RNAMute and RDMAS in principle perform energy minimization prediction by available software such as RNAfold from the Vienna RNA package or Zuker’s Mfold, while RNAmutants provides an efficient method using essential ingredients from energy minimization prediction. Both RNAMute in its extended version that uses RNAsubopt from the Vienna RNA package and the RNAmutants software are able to predict multiple-point mutations using developed methodologies, while RDMAS is currently restricted to single-point mutations. The strength of RNAMute in its extended version is the ability to predict a small number of point mutations in an accurate manner. RNAmutants is well fit for large scale simulations involving the calculation of all k-mutants, where k can be a large integer number, of a given RNA sequence. [Less...]

Slides et énoncés de TP

Séance 1 : Structure(s) de l'ARN, programmation dynamique, repliement dans le modèle d'énergie de Nussinov/Jacobson.
[Slides cours], [Version compacte-4pp]
[Énoncé TP 1 + Rappels Python]
[Wrappers Python Vienna Package]
Séance 2 : Structures sous-optimales, ensemble de Boltzmann, échantillonnage statistique et pseudo-noeuds simples.
[Slides cours], [Version compacte-4pp]
[Énoncé TP 2]
Séance 3 : Comparaison/alignement (2D/3D), introduction aux méthodes génériques pour la programmation dynamique.
[Slides cours], [Version compacte-4pp]
[Énoncé TP 3]