Yann Ponty - CR CNRS@LIX
Publications
Triées par date Triées par catégorie
• 2018
• Conférences avec comité de lecture et actes
• S. Hammer, Y. Ponty, W. Wang, and S. Will
Proceedings of RECOMB 2018 – 22nd Annual International Conference on Research in Computational Molecular Biology, France, 2018 [hal]
Résumé (cliquer pour lire). The design of multi-stable RNA molecules has important applications in biology, medicine, and biotechnology. Synthetic design approaches largely benefit from effective in-silico methods, which can tremendously impact their cost and feasibility. Here, we revisit a central ingredient of most in-silico design methods: the sampling of sequences for multi-target design. We establish the #P-hardness of uniform sampling, and introduce RNARedPrint, a tree decomposition-based algorithm for efficient, fixed parameter tractable sampling. By modeling the problem as a constraint network, RNARedPrint supports generic Boltzmann-weighted sampling for arbitrary additive RNA energy models; this enables generating designs meeting specific goals like expected free energies or GC-content. Finally, we empirically study general properties of the approach and generate biologically relevant multi-target Boltzmann-weighted designs for a common design benchmark. In particular, we show significant improvements over seed sequences generated by uniform sampling, the previously best available sampling strategy for multi-target design. Our software is freely available at: https://github.com/yannponty/RNARedPrint
• 2017
• Revues avec comité de lecture
• B. Löwes, C. Chauve, Y. Ponty, and R. Giegerich
The BRaliBase dent - a tale of benchmark design and interpretation.
Briefings in Bioinformatics, Oxford University Press (OUP), 18(2):306--311 pp, 2017 [doi] [hal] [pubmed]
Résumé (cliquer pour lire). BRaliBase is a widely used benchmark for assessing the accuracy of RNA secondary structure alignment methods. In most case studies based on the BRaliBase benchmark, one can observe a puzzling drop in accuracy in the 40-60% sequence identity range, the so-called 'BRaliBase Dent'. In this article, we show this dent is owing to a bias in the composition of the BRaliBase benchmark, namely the inclusion of a disproportionate number of transfer RNAs, which exhibit a conserved secondary structure. Our analysis, aside of its interest regarding the specific case of the BRaliBase benchmark, also raises important questions regarding the design and use of benchmarks in computational biology.
• J. Haleš, A. Héliou, J. Maňuch, Y. Ponty, and L. Stacho
Algorithmica, Springer Verlag, 79(3):835--856 pp, 2017 [doi] [hal]
Résumé (cliquer pour lire). We consider the Combinatorial RNA Design problem, a minimal instance of RNA design where one must produce an RNA sequence that adopts a given secondary structure as its minimal free-energy structure. We consider two free-energy models where the contributions of base pairs are additive and independent: the purely combinatorial Watson-Crick model, which only allows equally-contributing A − U and C − G base pairs, and the real-valued Nussinov-Jacobson model, which associates arbitrary energies to A − U, C − G and G − U base pairs. We first provide a complete characterization of designable structures using restricted alphabets and, in the four-letter alphabet, provide a complete characterization for designable structures without unpaired bases. When unpaired bases are allowed, we characterize extensive classes of (non-)designable structures, and prove the closure of the set of designable structures under the stutter operation. Membership of a given structure to any of the classes can be tested in Θ(n) time, including the generation of a solution sequence for positive instances. Finally, we consider a structure-approximating relaxation of the design, and provide a Θ(n) algorithm which, given a structure S that avoids two trivially non-designable motifs, transforms S into a designable structure constructively by adding at most one base-pair to each of its stems.
• A. Churkin, M. Retwitzer, V. Reinharz, Y. Ponty, J. Waldispühl, and D. Barash
Briefings in Bioinformatics, Oxford University Press (OUP), To appear 2017 [doi] [hal] [pubmed]
Résumé (cliquer pour lire). Computational programs for predicting RNA sequences with desired folding properties have been extensively developed and expanded in the past several years. Given a secondary structure, these programs aim to predict sequences that fold into a target minimum free energy secondary structure, while considering various constraints. This procedure is called inverse RNA folding. Inverse RNA folding has been traditionally used to design optimized RNAs with favorable properties, an application that is expected to grow considerably in the future in light of advances in the expanding new fields of synthetic biology and RNA nanostructures. Moreover, it was recently demonstrated that inverse RNA folding can successfully be used as a valuable preprocessing step in computational detection of novel noncoding RNAs. This review describes the most popular freeware programs that have been developed for such purposes, starting from RNAinverse that was devised when formulating the inverse RNA folding problem. The most recently published ones that consider RNA secondary structure as input are antaRNA, RNAiFold and incaRNAfbinv, each having different features that could be beneficial to specific biological problems in practice. The various programs also use distinct approaches, ranging from ant colony optimization to constraint programming, in addition to adaptive walk, simulated annealing and Boltzmann sampling. This review compares between the various programs and provides a simple description of the various possibilities that would benefit practitioners in selecting the most suitable program. It is geared for specific tasks requiring RNA design based on input secondary structure, with an outlook toward the future of RNA design programs.
• J. Deforges, S. De Breyne, M. Ameur, N. Ulryck, N. Chamond, A. Saaidi, Y. Ponty, T. Ohlmann, and B. Sargueil
Nucleic Acids Research, Oxford University Press, 45(12):7382--7400 pp, 2017 [doi] [hal] [pubmed]
Résumé (cliquer pour lire). In the late phase of the HIV virus cycle, the full length unspliced genomic RNA is exported to the cytoplasm and serves as mRNA to translate the Gag and Gag-pol polyproteins. Three different translation initiation mechanisms responsible for Gag production have been described. However a rationale for the involvement of as many translation pathways in gRNA translation is yet to be defined. The Gag-IRES has the singularity to be located within the Gag open reading frame and to directly recruit the 40S ribosomal subunit. To further characterize this interaction, we first probed the Gag-IRES RNA structure. We then developed an innovative integrative modelling approach and propose a novel secondary structure model for the Gag-IRES. The minimal 40S ribosomal subunit binding site was further mapped using different assays. To our surprise, we found that at least two regions within Gag-IRES can independently recruit the ribosome. Next, we validated that these two regions influence Gag translation both in vitro and in cellulo. These binding sites are mostly unstructured and highly A-rich, such sequences have previously been shown to be sufficient to recruit the ribosome and to support an IRES function. A combination of biochemical and functional data give insight into the Gag-IRES molecular mechanism and provide compelling evidences for its importance. Hypothesis about its physiological role reflecting its conservation amongst primate lentiviruses are proposed.
• W. Duchemin, Y. Anselmetti, M. Patterson, Y. Ponty, S. Bérard, C. Chauve, C. Scornavacca, V. Daubin, and E. Tannier
Genome Biology and Evolution, Society for Molecular Biology and Evolution, 9(5):1312-1319 pp, 2017 [doi] [hal]
Résumé (cliquer pour lire). DeCoSTAR is a software that aims at reconstructing the organization of ancestral genes or genomes in the form of sets of neighborhood relations (adjacencies) between pairs of ancestral genes or gene domains. It can also improve the assembly of fragmented genomes by proposing evolutionary-induced adjacencies between scaffolding fragments. Ancestral genes or domains are deduced from reconciled phylogenetic trees under an evolutionary model that considers gains, losses, speciations, duplications, and transfers as possible events for gene evolution. Reconciliations are either given as input or computed with the ecceTERA package, into which DeCoSTAR is integrated. DeCoSTAR computes adjacency evolutionary scenarios using a scoring scheme based on a weighted sum of adjacency gains and breakages. Solutions, both optimal and near-optimal, are sampled according to the Boltzmann-Gibbs distribution centered around parsimonious solutions, and statistical supports on ancestral and extant adjacencies are provided. DeCoSTAR supports the features of previously-contributed tools that reconstruct ancestral adjacencies, namely DeCo, DeCoLT, ART-DeCo and DeClone. In a few minutes, DeCoSTAR can reconstruct the evolutionary history of domains inside genes, of gene fusion and fission events, or of gene order along chromosomes, for large data sets including dozens of whole genomes from all kingdoms of life. We illustrate the potential of DeCoSTAR with several applications: ancestral reconstruction of gene orders for Anopheles mosquito genomes, multidomain proteins in Drosophila, and gene fusion and fission detection in Actinobacteria. Availability: http://pbil.univ-lyon1.fr/software/DeCoSTAR
• Conférences avec comité de lecture et actes
• J. Michálik, H. Touzet, and Y. Ponty
Proceedings of ISMB/ECCB - 25th Annual international conference on Intelligent Systems for Molecular Biology/16th European Conference on Computational Biology - 2017, Czech Republic, 33(14):i283 - i292 pp, 2017 [doi] [hal]
Résumé (cliquer pour lire). Motivation: Kinetics is key to understand many phenomena involving RNAs, such as co-transcriptional folding and riboswitches. Exact out-of-equilibrium studies induce extreme computational demands, leading state-of-the-art methods to rely on approximated kinetics landscapes, obtained using sampling strategies that strive to generate the key landmarks of the landscape topology. However, such methods are impeded by a large level of redundancy within sampled sets. Such a redundancy is uninformative, and obfuscates important intermediate states, leading to an incomplete vision of RNA dynamics. Results: We introduce RNANR, a new set of algorithms for the exploration of RNA kinetics landscapes at the secondary structure level. RNANR considers locally optimal structures, a reduced set of RNA con-formations, in order to focus its sampling on basins in the kinetic landscape. Along with an exhaustive enumeration, RNANR implements a novel non-redundant stochastic sampling, and offers a rich array of structural parameters. Our tests on both real and random RNAs reveal that RNANR allows to generate more unique structures in a given time than its competitors, and allows a deeper exploration of kinetics landscapes. Availability: RNANR is freely available at https://project.inria.fr/rnalands/rnanr
• A. Saaidi, Y. Ponty, M. Blanchette, M. Regnier, and B. Sargueil
An EM algorithm for mapping short reads in multiple RNA structure probing experiments
Proceedings of Matbio2017, United Kingdom, 2017 [hal]
• Conférences sans comité de lecture ou sans actes
• A. Saaidi, Y. Ponty, and B. Sargueil
An integrative approach for predicting the RNA secondary structure for the HIV–1 Gag UTR using probing data
Proceedings of JOBIM 2017 - Journées Ouvertes en Biologie, Informatique et Mathématiques, France, 2017 [hal]
Résumé (cliquer pour lire). Structure modeling is key to understand the mechanisms of RNA retroviruses such as HIV. Many in silico prediction approaches suggesting structural models of moderate to good accuracies are available. However, the prediction methods could be further improved by taking advantage of both next generation sequencing technologies and different experimental techniques such as enzymatic and SHAPE probing data [1]. In a published article [2], we introduce and use a structural modeling method based on the integration of many experimental probing data to direct predictions with the aim to find the most accurate structure lying in the intersection of disjoint sources of experiments.
• Divers & Soumis
• C. Chauve, J. Courtiel, and Y. Ponty
[hal]
Résumé (cliquer pour lire). Pairwise ordered tree alignment are combinatorial objects that appear in important applications , such as RNA secondary structure comparison. However, the usual representation of tree alignments as supertrees is ambiguous, i.e. two distinct supertrees may induce identical sets of matches between identical pairs of trees. This ambiguity is uninformative, and detrimental to any probabilistic analysis. In this work, we consider tree alignments up to equivalence. Our first result is a precise asymptotic enumeration of tree alignments, obtained from a context-free grammar by mean of basic analytic combinatorics. Our second result focuses on alignments between two given ordered trees S and T. By refining our grammar to align specific trees, we obtain a decomposition scheme for the space of alignments, and use it to design an efficient dynamic programming algorithm for sampling alignments under the Gibbs-Boltzmann probability distribution. This generalizes existing tree alignment algorithms, and opens the door for a probabilistic analysis of the space of suboptimal alignments.
• D. Surujon, Y. Ponty, and P. Clote
[hal]
Résumé (cliquer pour lire). Let $S_n$ denote the network of all RNA secondary structures of length $n$, in which undirected edges exist between structures $s$, $t$ such that $t$ is obtained from $s$ by the addition, removal or shift of a single base pair. Using context-free grammars, generating functions and complex analysis, we show that the asymptotic average degree is $O(n)$ and that the asymptotic clustering coefficient is $O(1/n)$, from which it follows that the family $S_n$, $n = 1, 2, 3,\ldots$ of secondary structure networks is not small-world.
• 2016
• Revues avec comité de lecture
• V. Reinharz, Y. Ponty, and J. Waldispühl
Nucleic Acids Research, Oxford University Press, 44(11):e104 - e104 pp, 2016 [doi] [hal] [pubmed]
Résumé (cliquer pour lire). Systematic structure probing experiments (e.g. SHAPE) of RNA mutants such as the mutate-and-map protocol give us a direct access into the genetic robustness of ncRNA structures. Comparative studies of homologous sequences provide a distinct, yet complementary, approach to analyze structural and functional properties of non-coding RNAs. In this paper, we introduce a formal framework to combine the biochemical signal collected from mutate-and-map experiments, with the evolutionary information available in multiple sequence alignments. We apply neutral theory principles to detect complex long-range dependencies between nucleotides of a single stranded RNA, and implement these ideas into a software called aRNhAck. We illustrate the biological significance of this signal and show that the nucleotides networks calculated with aRNhAck are correlated with nucleotides located in RNA-RNA, RNA-protein, RNA-DNA and RNA-ligand interfaces. aRNhAck is freely available at http://csb.cs.mcgill.ca/arnhack.
• E. Jacox, C. Chauve, G. Szöllősi, Y. Ponty, and C. Scornavacca
ecceTERA: comprehensive gene tree-species tree reconciliation using parsimony.
Bioinformatics (Oxford, England), 2056-8 pp, 2016 [hal] [pubmed]
Résumé (cliquer pour lire). A gene tree-species tree reconciliation explains the evolution of a gene tree within the species tree given a model of gene-family evolution. We describe ecceTERA, a program that implements a generic parsimony reconciliation algorithm, which accounts for gene duplication, loss and transfer (DTL) as well as speciation, involving sampled and unsampled lineages, within undated, fully dated or partially dated species trees. The ecceTERA reconciliation model and algorithm generalize or improve upon most published DTL parsimony algorithms for binary species trees and binary gene trees. Moreover, ecceTERA can estimate accurate species-tree aware gene trees using amalgamation.ecceTERA is freely available under http://mbb.univ-montp2.fr/MBB/download_sources/16__ecceTERA and can be run online at http://mbb.univ-montp2.fr/MBB/subsection/softExec.php?soft=eccetera
• M. Drory Retwitzer, V. Reinharz, Y. Ponty, J. Waldispühl, and D. Barash
Nucleic Acids Research, Oxford University Press, 44(W1):W308 - W314 pp, 2016 [doi] [hal]
Résumé (cliquer pour lire). In recent years, new methods for computational RNA design have been developed and applied to various problems in synthetic biology and nanotechnology. Lately, there is considerable interest in incorporating essential biological information when solving the inverse RNA folding problem. Correspondingly, RNAfbinv aims at including biologically meaningful constraints and is the only program to-date that performs a fragment-based design of RNA sequences. In doing so it allows the design of sequences that do not necessarily exactly fold into the target, as long as the overall coarse-grained tree graph shape is preserved. Augmented by the weighted sampling algorithm of incaRNAtion, our web server called incaRNAfbinv implements the method devised in RNAfbinv and offers an interactive environment for the inverse folding of RNA using a fragment-based design approach. It takes as input: a target RNA secondary structure; optional sequence and motif constraints; optional target minimum free energy, neutrality, and GC content. In addition to the design of synthetic regulatory sequences, it can be used as a pre-processing step for the detection of novel natural occurring RNAs. The two complementary methodologies RNAfbinv and incaRNAtion are merged together and fully implemented in our web server incaRNAfbinv, available at http://www.cs.bgu. ac.il/incaRNAfbinv.
• Conférences avec comité de lecture et actes
• C. Chauve, J. Courtiel, and Y. Ponty
Proceedings of ALCOB - 3rd International Conference on Algorithms for Computational Biology - 2016, Spain, Springer, 9702:53--64 pp, 2016 [doi] [hal]
Résumé (cliquer pour lire). Pairwise ordered tree alignment are combinatorial objects that appear in RNA secondary structure comparison. However, the usual representation of tree alignments as supertrees is ambiguous, i.e. two distinct supertrees may induce identical sets of matches between identical pairs of trees. This ambiguity is uninformative, and detrimental to any probabilistic analysis.In this work, we consider tree alignments up to equivalence. Our first result is a precise asymptotic enumeration of tree alignments, obtained from a context-free grammar by mean of basic analytic combinatorics. Our second result focuses on alignments between two given ordered trees $S$ and $T$. By refining our grammar to align specific trees, we obtain a decomposition scheme for the space of alignments, and use it to design an efficient dynamic programming algorithm for sampling alignments under the Gibbs-Boltzmann probability distribution. This generalizes existing tree alignment algorithms, and opens the door for a probabilistic analysis of the space of suboptimal RNA secondary structures alignments.
• J. Lumbroso, M. Mishna, and Y. Ponty
Proceedings of GASCOM - 10th conference on random generation of combinatorial structures - 2016, France, Electronic Notes in Discrete Mathematics, 59(Supp. C):99 -- 114 pp, 2016 [doi] [hal]
Résumé (cliquer pour lire). A lattice walk model is said to be reluctant if the defining step set has a strong drift towards the boundaries. We describe efficient random generation strategies for these walks.
• Conférences sans comité de lecture ou sans actes
• A. Saaidi, D. Allouche, B. Sargueil, and Y. Ponty
Proceedings of JOBIM - Journées Ouvertes en Biologie, Informatique et Mathématiques - 2016, France, 2016 [hal]
Résumé (cliquer pour lire). Next-Generation Sequencing (NGS) technologies have opened new perspectives to refine the process of predicting the secondary structure(s) of structured non-coding RNAs. Herein, we describe an integrated modeling strategy, based on the SHAPE chemistry, to infer structural insight from deep se-quencing data. Our approach is based on a pseudo-energy minimization, incorporating additional information from evolutionary data (compensatory mutations) and SHAPE experiments (reactivity scores) within an iterative procedure. Preliminary results reveal conserved and stable structures within UTRs of the Ebola Genome, that are both thermodynamically-stable and highly supported by SHAPE accessibility analysis.
• W. Wang​, M. Barba​, P. Rinaudo​, A. Denise, and Y. Ponty
Proceedings of JOBIM - Journées Ouvertes en Biologie, Informatique et Mathématiques - 2016, France, 2016 [hal]
Résumé (cliquer pour lire). Aligning macromolecules such as proteins, DNAs and RNAs in order to reveal, or conversely exploit, their functional homology is a classic challenge in bioinformatics, with far­reaching applications in structure modelling and genome annotations. In the specific context of complex RNAs, featuring pseudoknots, multiple interactions and non­canonical base pairs, multiple algorithmic solutions and tools have been proposed for the structure/sequence alignment problem. However, such tools are seldom used in practice, due in part to their extreme computational demands, and because of their inability to support general types of structures. Recently, a general parameterized algorithm based on tree decomposition of the query structure has been designed by Rinaudo et al. We present an implementation of the algorithm within a tool named LiCoRNA. We compare it against state­of­the­art algorithms. We show that it both gracefully specializes into a practical algorithm for simple classes pseudoknot, and offers a general solution for complex pseudoknots, which are explicitly out­of­reach of competing softwares.
• Divers & Soumis
• A. Saaidi, D. Allouche, Y. Ponty, B. Sargueil, and M. Regnier
[hal]
Résumé (cliquer pour lire). 1- Introduction:RNA structure is a key to understand retroviruses’s mechanisms e.g. HIV. Many prediction approaches suggesting accurate structures are available but they could be further improved by both taking advantage of Next generation sequencing technology and new experimental techniques (Enzymatic and SHAPE).2 - Experimental probing dataIn this poster, we present an integrative approach based on using many experimental data, resulting from sequencing, to direct predictions with the aim to find an accurate structure lying in the intersection of different sources of experiments. From one side, to reveal single nucleotide, reactivity profiles resulting from a SHAPE technology were used as “soft constraints”, meaning that the reactivity values were translated into pseudo-energies as described (Lorenz et al, 2016). From the other side, RNAses cleavage was used with two enzymes V and T targeting respectively paired and unpaired nucleotides. Reactivity scores resulting from those two experiments are used as hard constraints, forcing positions that exceed a specific threshold to be paired(case of Enzymatic-V) or unpaired(case of Enzymatic-T).3-stochastic sampling:At the thermodynamic equilibrium, a given RNA can have many alternative structures, where each structure could be characterized by a probability within the space of all the possible conformations (Boltzmann ensemble). This probability is related to the energy of the structure, the highest the energy needed to break pairs present in the structure the highest is its probability in the ensemble.We admit that the optimal structure(s) should be energetically stable and supported by several experimental data. For this reason, we coupled a stochastic sampling from the Boltzmann ensembles associated with the experimentally derived constraints, with a clustering across experimental conditions, to generate a structural models that are well-supported by available data.4-The work-flow description: 1. Experimental data from different conditions(SHAPE,Enzymatic-T, Enzymatic-V) were analysed to extract reactivity profiles that will serve as constraints. 2. We sampled 2000 structures per condition: We perform a Boltzmann sampling (Ding et al, 2005) to generate a predefined number of stable structures, compatible with the constraints derived for each condition. We used the stochastic sampling mode of RNAsubopt (-p option) to generate energetically stable structures that are either fully compliant with constraints derived from enzymatic data (hard constraints(Mathews et al,2004)), or constitute reasonable trade-offs between thermodynamic stability and compatibility with SHAPE data (soft constraints, using thepseudo-potentials of Deigan et al. (Deigan et al, 2009), see (Lorenz et al,2016) for details). 3. We merge the structures while keeping labels to retain the origin of each structure. 4. In order to detect structures with affinity to each other, The merged sets of models were clustered, using the base-pair distance as a measure of dissimilarity, the distance between two structures corresponds to the number of base pairs needed to break and to build in order to go from a structure to an other. A clustering algorithms (affinity propagation (Wang et al, 2007) implemented in the scikit-learn Python package (Pedregosa et al, 2011) is used to agglomerate and identify recurrent structures. One of the advantages of affinity propagation resides in its low computational requirements. 5. The next step consists on identifying clusters that are homogeneous, stable and well supported by experimental evidences, leading to the identification of the following objective criteria:-Present conditions that informs as about the diversity of the cluster: Our primary target are clusters compatible with multiple experimental conditions. However, the larger sampled sets required for reproducibility tend to populate each cluster with structures fromall conditions. We thus associated with each cluster the number of represented conditions, defined as the number of conditions for whichthe accumulated Boltzmann probability in the cluster exceeds a predefined threshold.-Boltzmann weight that is a measure of stability: Structures that are found in a given cluster may be unstable, and should be treated asoutliers. For this reason, we computed the cumulated normalized Boltzmann probabilities within the cluster, to favor stable clustersconsisting of stable structures;- Average Cluster Distance to count for coherence: We observed a general tendency of clustering algorithms to create heterogeneousclusters when faced with noisy data. We thus associated with each cluster the mean distance between pairs of structures, estimated asthe average distance to the MEA (Lu et al, 2009) for the sake of efficiency, in order to neglect clusters that were too diverse.6. The next steps consist on choosing cluster(s) with high coherence, diversity and stability. for this purpose we restricted our analysis to clusters that were found on the 3D Pareto Frontier (Mattson Messac, 2005) with respect to the three mentioned above criteria .7. After detecting the optimal Pareto cluster(s), we need to identify representative structure for each cluster. We chose the maximum expectedaccuracy (MEA) structure (Lu et al, 2009) as the representative structure for each cluster, which is defined as the secondary structure whose structural elements have highest accumulated Boltzmann probability within the cluster.5 Results:This resulted in 2 structures which we narrowed down to a single candidate using compatibility with the 1M7 SHAPE data as a final discriminatory criterion.
• 2015
• Revues avec comité de lecture
• C. Chauve, Y. Ponty, and J. Zanetti
Evolution of genes neighborhood within reconciled phylogenies: an ensemble approach
BMC Bioinformatics, BioMed Central, 16(Suppl 19):S6 pp, 2015 [doi] [hal]
Résumé (cliquer pour lire). CONTEXT:The reconstruction of evolutionary scenarios for whole genomesin terms of genome rearrangements is a fundamental problem in evolutionaryand comparative genomics. The DeCo algorithm, recently introducedby Berard et al., computes parsimonious evolutionary scenarios forgene adjacencies, from pairs of reconciled gene trees. However, asfor many combinatorial optimization algorithms, there can exist manyco-optimal, or slightly sub-optimal, evolutionary scenarios thatdeserve to be considered.CONTRIBUTION:We extend the DeCo algorithmto sample evolutionary scenarios from the whole solution space underthe Boltzmann distribution, and also to compute Boltzmann probabilitiesfor specific ancestral adjacencies.RESULTS:We apply our algorithmsto a dataset of mammalian gene trees and adjacencies, and observea significant reduction of the number of syntenic conflicts observedin the resulting ancestral gene adjacencies.
• Conférences avec comité de lecture et actes
• A. Rajaraman, C. Chauve, and Y. Ponty
Proceedings of ISBRA - 11th International Symposium on Bioinformatics Research and Applications - 2015, United States, 9096:260--271 pp, 2015 [doi] [hal]
Résumé (cliquer pour lire). The availability of a large number of assembled genomes opens the way to study the evolution of syntenic character within a phylogenetic context. The DeCo algorithm, recently introduced by Bérard et al. allows the computation of parsimonious evolutionary scenarios for gene adjacencies, from pairs of reconciled gene trees. Following the approach pioneered by Sturmfels and Pachter, we describe how to modify the DeCo dynamic programming algorithm to identify classes of cost schemes that generates similar parsimonious evolutionary scenarios for gene adjacencies, as well as the robustness to changes to the cost scheme of evolutionary events of the presence or absence of specific ancestral gene adjacencies. We apply our method to six thousands mammalian gene families, and show that computing the robustness to changes to cost schemes provides new and interesting insights on the evolution of gene adjacencies and the DeCo model.
• J. Haleš, J. Maňuch, Y. Ponty, and L. Stacho
Proceedings of CPM - 26th Annual Symposium on Combinatorial Pattern Matching, Italy, 2015 [hal]
Résumé (cliquer pour lire). In this work, we consider the Combinatorial RNA Design problem, a minimal instance of the RNA design problem which aims at finding a sequence that admits a given target as its unique base pair maximizing structure. We provide complete characterizations for the structures that can be designed using restricted alphabets. Under a classic four-letter alphabet, we provide a complete characterization of designable structures without unpaired bases. When unpaired bases are allowed, we provide partial characterizations for classes of designable/undesignable structures, and show that the class of designable structures is closed under the stutter operation. Membership of a given structure to any of the classes can be tested in linear time and, for positive instances, a solution can be found in linear time. Finally, we consider a structure-approximating version of the problem that allows to extend bands (helices) and, assuming that the input structure avoids two motifs, we provide a linear-time algorithm that produces a designable structure with at most twice more base pairs than the input structure.
• Chapitres d'ouvrages et éditoriaux
• Y. Ponty, and F. Leclerc
RNA Bioinformatics, Editor(s) Ernesto Picardi, Methods in Molecular Biology, Springer New York, 63-100 pp, 2015 [doi] [hal] [pubmed]
Résumé (cliquer pour lire). Secondary structure diagrams are essential, in RNA biology, to communicate functional hypotheses and summarize structural data, and communicate them visually as drafts or finalized publication-ready figures. While many tools are currently available to automate the production of such diagrams, their capacities are usually partial, making it hard for a user to decide which to use in a given context. In this chapter, we guide the reader through the steps involved in the production of expressive publication-quality illustrations featuring the RNA secondary structure. We present major existing representations and layouts, and give precise instructions to produce them using available free software, including jViz.RNA, the PseudoViewer, RILogo, R-chie, RNAplot, R2R, and VARNA. We describe the file formats and structural descriptions accepted by popular RNA visualization tools. We also provide command lines and Python scripts to ease the user's access to advanced features. Finally, we discuss and illustrate alternative approaches to visualize the secondary structure in the presence of probing data, pseudoknots, RNA-RNA interactions, and comparative data.
• F. Jossinet, Y. Ponty, and J. Waldispühl
Preface. Extended versions of selected papers presented at Computational methods for Structural RNAs (CMSR'14)
Computational methods for Structural RNAs (CMSR'14), Journal of Computational Biology, Mary Ann Liebert, 189 pp, 2015 [hal] [pubmed]
Résumé (cliquer pour lire). This preface introduces to the extended versions of selected papers presented at Computational methods for Structural RNAs (CMSR'14)
• Conférences sans comité de lecture ou sans actes
• V. Le Gallic, A. Denise, and Y. Ponty
Résultats algorithmiques pour le design d’ARN avec contraintes de séquence
Proceedings of SeqBio 2015, France, 2015 [hal]
Résumé (cliquer pour lire). Le problème du design consiste à concevoir une ou des séquences d’ARN qui vont, dans un modèleénergétique, se replier en une structure cible. Les algorithmes déjà existants dans le domaine negèrent pour la plupart pas l’ajout au problème des contraintes de séquence, c’est-à-dire l’interdictionou l’obligation d’utiliser certains motifs.Zhou et al. (2013) ont proposé un algorithme qui utilise de la génération aléatoire dans un langageformel conditionné par un automate fini qui gère les contraintes de motifs interdits/imposés et unegrammaire non contextuelle qui gère les contraintes imposées par la structure recherchée.On propose ici de se baser sur cet algorithme en y ajoutant des optimisations permettant de réduiresa complexité. Ce travail soulève également des questions ouvertes sur la construction efficace d’unautomate (quasi) minimal, ainsi que l’incorporation de contraintes supplémentaires garantissant lerepliement des séquences engendrées vers une structure cible à l’exclusion de tout autre repliement.
• Exposés invités
• Y. Ponty
RNA Bioinformatics and ensemble dynamic programming
Résumé (cliquer pour lire). RiboNucleic Acids (RNAs) are fascinating biomolecules which, similarly to DNA, can be encoded as sequences over a four-letter alphabet, and perform a wide array of biological functions. However, unlike DNA, the precise function of a given RNA depends critically on its structure, adopted as the outcome of a folding process. Luckily, this intricate three- dimensional conformation can be adequately abstracted as a (non-crossing) list of contacts, i.e. a discrete combinatorial object.In this talk, I will emphasize how, over the past three decades, RNA biology has benefited from a continuous and fruitful cross-talk between discrete mathematicians, computer scientists and biochemists. At the center of this conversation lies the concept of dynamic-programming, an algorithmic design technique which solves a combinatorial optimization problem efficiently by taking advantage of a well-chosen decomposition of its search space. Extensions and optimized instances of this technique now allow to address, at a genomic scale, multiple questions related to the analysis of the Boltzmann ensemble and the sequence-structure(-function) relationship. These developments also raise well-defined open questions, motivating further studies of the underlying discrete structures.
• Y. Ponty
Towards firm foundations for the rational design of RNA molecules
Proceedings of Colloque de Bioinformatique Moléculaire (GdR BIM), France, 2015 [hal]
Résumé (cliquer pour lire). The overall architecture of an RNA can be predicted with acceptable accuracy using established computational methods, such as those implemented in the MFold or the RNAFold software. This opens the door for a rational design of RNA molecules, to serve as building blocks in synthetic biology, or as novel therapeutic agents. To that purpose, one must solve the inverse folding problem, i.e. construct an RNA sequence which adopts a specific secondary structure according to established structure prediction methods. However, contrasting with the prediction task, the underlying computational structure of inverse folding is poorly understood, and existing tools either fail to return admissible sequences for some structures, or require extraordinary computational resources. Consequently, the field of RNA bioinformatics is currently craving for both practical and theoretically-sound approaches for the problem. In this talk, I will introduce the many flavors of the RNA design problem and illustrate its relevance to the objectives of modern (synthetic) biology. I will outline the strengths and shortcomings of selected approaches, and will briefly present recently contributed methods, developed with collaborators at Univ McGill (Canada), Wuhan Univ. (China) and Univ. Paris-Sud. If time allows, I will also mention results recently obtained with collaborators at Simon Fraser University, which characterize large subclasses of (un)designable secondary structures in simple energy models. These results are currently being extended to more realistic energy models, and could serve as a foundation for exact, computationally-efficient, solutions for RNA design.
• 2014
• Conférences avec comité de lecture et actes
• C. Chauve, Y. Ponty, and J. Zanetti
Proceedings of BSB - Brazilian Symposium on Bioinformatics - 2014, Brazil, Advances in Bioinformatics and Computational Biology, Springer, 8826:49--56 pp, 2014 [doi] [hal]
Résumé (cliquer pour lire). We consider a recently introduced dynamic programming scheme to compute parsimonious evolutionary scenarios for gene adjacencies. We extend this scheme to sample evolutionary scenarios from the whole solution space under the Boltzmann distribution. We apply our algorithms to a dataset of mammalian gene trees and adjacencies, and observe a significant reduction of the number of syntenic inconsistencies observed in the resulting ancestral gene adjacencies.
• Chapitres d'ouvrages et éditoriaux
• Y. Ponty
1024 - Bulletin de la société informatique de France, Editor(s) Eric Sopena, SIF, 23--53 pp, 2014 [hal]
Résumé (cliquer pour lire). Nous présentons l'histoire et les développements récents de la recherche en bioinformatique des ARN, en insistant particulièrement sur sa dimension algorithmique, ainsi que sur les nombreuses et fructueuses interactions avec d'autres sciences qui jalonnent son histoire.
• Direction d'ouvrages
• F. Jossinet, Y. Ponty, and J. Waldispühl
Proceedings of the 1st workshop on Computational Methods for Structural RNAs
Editor(s) Jossinet, Fabrice & Ponty, Yann & Waldispühl, Jérôme, McGill University, ISBN 978-2-9550187-0-5, 2014 [doi] [hal]
Résumé (cliquer pour lire). Ribonucleic acids (RNAs) play key roles in various aspects of the gene transcription and regulation processes, and are the focus of an ever-increasing interest in all areas of molecular biology. Deciphering the function of a non-protein coding RNA requires an intimate knowledge of its structure, motivating the development of structure-centric methods. This volume contains original peer-reviewed contributions and invited communications presented at "CMSR'14: Computational Methods for Structural RNAs" held on September 7, 2014 in Strasbourg. This event was hosted as an official workshop of ECCB'14: 13th European Conference on Computational Biology.Proceedings are freely downloadable at http://cmsr14.cs.mcgill.ca/
• Divers & Soumis
• C. Chauve, Y. Ponty, and J. Zanetti
[hal]
Résumé (cliquer pour lire). Reconstruction of the evolutionary history of genomic characters along a given species tree is a long-standing problem in computational biology, with efficient algorithms to compute parsimonious scenarios for many types of characters, as in the cases of genes and genomes sequences, gene content, and gene family evolution. Recently, Bérard et al. extended the corpus of such results to syntenic characters, as they introduced the notion of adjacency forest, that models the evolution of gene adjacencies within a phylogeny, and described an efficient dynamic programming (DP) algorithm, called DeCo (Berard et al, ECCB 2012), to compute parsimonious adjacency evolutionary histories. Applying the classical parsimony-based approach of DeCo to a dataset of over 6,000 pairs of mammalian gene trees yielded a significant number of ancestral genes involved in more than two adjacencies, which correspond to syntenic inconsistencies.Recently, more general approaches for parsimony problems have been analyzed, either exploring a wider range of parameters, or considering several alternate histories for a given instance. Our work seeks to address the syntenic inconsistencies of DeCo by extending their DP scheme toward an exploration of the whole solution space of adjacency histories, under the Boltzmann probability distribution, that assigns a probability to each solution defined in terms of its parsimony score. This principle has been applied in several contexts and is sometimes known as the “Boltzmann ensemble approach”.While this Boltzmann ensemble approach has been used for a long time in RNA structure analysis, to the best of our knowledge it is not the case in comparative genomics, where exact probabilistic models have been favored as increasing computational capacities allow them to handle realistic datasets. However, such a probabilistic model does not exist so far for gene adjacencies, which motivates our work.We first show that by sampling adjacencies histories under a Boltzmann distribution that favors co-optimal histories and conserving only frequent ancestral adjacencies, we can reduce significantly the number of syntenic inconsistencies. We also implement a inside/outside variant for DeCo, that computes the actual probabilities of each individual adjacency considered under the Boltzmann distribution.
• 2013
• Revues avec comité de lecture
• A. Lorenz, and Y. Ponty
Theoretical Computer Science, Elsevier, 502:177-194 pp, 2013 [doi] [hal]
Résumé (cliquer pour lire). We address the non-redundant random generation of $k$ words of length $n$ in a context-free language. Additionally, we want to avoid a predefined set of words. We study a rejection-based approach, whose worst-case time complexity is shown to grow exponentially with $k$ for some specifications and in the limit case of a coupon collector. We propose two algorithms respectively based on the recursive method and on an unranking approach. We show how careful implementations of these algorithms allow for a non-redundant generation of $k$ words of length $n$ in $\mathcal{O}(k\cdot n\cdot \log{n})$ arithmetic operations, after a precomputation of $\Theta(n)$ numbers. The overall complexity is therefore dominated by the generation of $k$ words, and the non-redundancy comes at a negligible cost.
• V. Reinharz, Y. Ponty, and J. Waldispühl
A weighted sampling algorithm for the design of RNA sequences with targeted secondary structure and nucleotide distribution.
Bioinformatics, Oxford University Press (OUP), 29(13):i308-15 pp, 2013 [doi] [hal] [pubmed]
Résumé (cliquer pour lire). MOTIVATIONS: The design of RNA sequences folding into predefined secondary structures is a milestone for many synthetic biology and gene therapy studies. Most of the current software uses similar local search strategies (i.e. a random seed is progressively adapted to acquire the desired folding properties) and more importantly do not allow the user to control explicitly the nucleotide distribution such as the GC-content in their sequences. However, the latter is an important criterion for large-scale applications as it could presumably be used to design sequences with better transcription rates and/or structural plasticity. RESULTS: In this article, we introduce IncaRNAtion, a novel algorithm to design RNA sequences folding into target secondary structures with a predefined nucleotide distribution. IncaRNAtion uses a global sampling approach and weighted sampling techniques. We show that our approach is fast (i.e. running time comparable or better than local search methods), seedless (we remove the bias of the seed in local search heuristics) and successfully generates high-quality sequences (i.e. thermodynamically stable) for any GC-content. To complete this study, we develop a hybrid method combining our global sampling approach with local search strategies. Remarkably, our glocal methodology overcomes both local and global approaches for sampling sequences with a specific GC-content and target structure. AVAILABILITY: IncaRNAtion is available at csb.cs.mcgill.ca/incarnation/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
• V. Reinharz, Y. Ponty, and J. Waldispühl
Journal of Computational Biology, Mary Ann Liebert, 20(11):905-19 pp, 2013 [doi] [hal] [pubmed]
Résumé (cliquer pour lire). The analysis of the sequence-structure relationship in RNA molecules is not only essential for evolutionary studies but also for concrete applications such as error-correction in next generation sequencing (NGS) technologies. The prohibitive sizes of the mutational and conformational landscapes, combined with the volume of data to process, require efficient algorithms to compute sequence-structure properties. In this article, we address the correction of NGS errors by calculating which mutations most increase the likelihood of a sequence to a given structure and RNA family. We introduce RNApyro, an efficient, linear time and space inside-outside algorithm that computes exact mutational probabilities under secondary structure and evolutionary constraints given as a multiple sequence alignment with a consensus structure. We develop a scoring scheme combining classical stacking base-pair energies to novel isostericity scores and apply our techniques to correct pointwise errors in 5s and 16s rRNA sequences. Our results suggest that RNApyro is a promising algorithm to complement existing tools in the NGS error-correction pipeline.
• Y. Zhang, Y. Ponty, M. Blanchette, E. Lecuyer, and J. Waldispühl
Nucleic Acids Research, Oxford University Press, 41(Web Server issue):W480-5 pp, 2013 [doi] [hal] [pubmed]
Résumé (cliquer pour lire). More than a simple carrier of the genetic information, messenger RNA (mRNA) coding regions can also harbor functional elements that evolved to control different post-transcriptional processes, such as mRNA splicing, localization and translation. Functional elements in RNA molecules are often encoded by secondary structure elements. In this aticle, we introduce Structural Profile Assignment of RNA Coding Sequences (SPARCS), an efficient method to analyze the (secondary) structure profile of protein-coding regions in mRNAs. First, we develop a novel algorithm that enables us to sample uniformly the sequence landscape preserving the dinucleotide frequency and the encoded amino acid sequence of the input mRNA. Then, we use this algorithm to generate a set of artificial sequences that is used to estimate the Z-score of classical structural metrics such as the sum of base pairing probabilities and the base pairing entropy. Finally, we use these metrics to predict structured and unstructured regions in the input mRNA sequence. We applied our methods to study the structural profile of the ASH1 genes and recovered key structural elements. A web server implementing this discovery pipeline is available at http://csb.cs.mcgill.ca/sparcs together with the source code of the sampling algorithm.
• A. Lopes, S. Sacquin-Mora, V. Dimitrova, E. Laine, Y. Ponty, and A. Carbone
Protein-protein interactions in a crowded environment: an analysis via cross-docking simulations and evolutionary information
PLoS Computational Biology, Public Library of Science, 9(12):e1003369 pp, 2013 [doi] [hal]
Résumé (cliquer pour lire). Protein-protein interactions (PPI) are at the heart of the molecular processes governing life and constitute an increasingly important target for drug design. Given their importance, it is vital to determine which protein interactions have functional relevance and to characterize the protein competition inherent to crowded environments, as the cytoplasm or the cellular organelles. We show that combining coarse-grain molecular cross-docking simulations and binding site predictions based on evolutionary sequence analysis is a viable route to identify true interacting partners for hundreds of proteins with a variate set of protein structures and interfaces. Also, we realize a large-scale analysis of protein binding promiscuity and provide a numerical characterization of partner competition and level of interaction strength for about 28000 false-partner interactions. Finally, we demonstrate that binding site prediction is useful to discriminate native partners, but also to scale up the approach to thousands of protein interactions. This study is based on the large computational effort made by thousands of internautes helping World Community Grid over a period of 7 months. The complete dataset issued by the computation and the analysis is released to the scientific community.
• Conférences avec comité de lecture et actes
• V. Reinharz, Y. Ponty, and J. Waldispühl
Proceedings of RECOMB - 17th Annual International Conference on Research in Computational Molecular Biology - 2013, China, 2013 [hal]
Résumé (cliquer pour lire). Analysis of the sequence-structure relationship in RNA molecules are essential to evolutionary studies but also to concrete applications such as error-correction methodologies in sequencing technologies. The prohibitive sizes of the mutational and conformational landscapes combined with the volume of data to proceed require e cient algorithms to compute sequence-structure properties. More speci cally, here we aim to calculate which mutations increase the most the likelihood of a sequence to a given structure and RNA family. In this paper, we introduce RNApyro, an e cient linear-time and space inside-outside algorithm that computes exact mutational probabilities under secondary structure and evolutionary constraints given as a multiple sequence alignment with a consensus structure. We develop a scoring scheme combining classical stacking base pair energies to novel isostericity scales, and apply our techniques to correct point-wise errors in 5s rRNA sequences. Our results suggest that RNApyro is a promising algorithm to complement existing tools in the NGS error-correction pipeline.
• V. Reinharz, Y. Ponty, and J. Waldispühl
Proceedings of ISMB/ECCB - 21st Annual international conference on Intelligent Systems for Molecular Biology/12th European Conference on Computational Biology - 2013, Germany, 2013 [hal]
Résumé (cliquer pour lire). Motivations: The design of RNA sequences folding into predefined secondary structures is a milestone for many synthetic biology and gene therapy studies. Most of the current software uses similar local search strategies (i.e. a random seed is progressively adapted to acquire the desired folding properties) and more importantly do not allow the user to control explicitly the nucleotide distribution such as the GC-content in their sequences. However, the latter is an important criterion for large-scale applications as it could presumably be used to design sequences with better transcription rates and/or structural plasticity. Results: In this paper, we introduce IncaRNAtion, a novel algorithm to design RNA sequences folding into target secondary structures with a predefined nucleotide distribution. IncaRNAtion uses a global sampling approach and weighted sampling techniques. We show that our approach is fast (i.e. running time comparable or better than local search methods), seed-less (we remove the bias of the seed in local search heuristics), and successfully generates highquality sequences (i.e. thermodynamically stable) for any GC-content. To complete this study, we develop an hybrid method combining our global sampling approach with local search strategies. Remarkably, our glocal methodology overcomes both local and global approaches for sampling sequences with a specific GC content and target structure. Availability: IncaRNAtion is available at csb.cs.mcgill.ca/incarnation/
• E. Senter, S. Sheikh, I. Dotu, Y. Ponty, and P. Clote
Proceedings of RECOMB - 17th Annual International Conference on Research in Computational Molecular Biology - 2013, China, 2013 [hal]
Résumé (cliquer pour lire). We describe the broad outline of a new thermodynamics-based algorithm, FFTbor, that uses the fast Fourier transform to perform polynomial interpolation to compute the Boltzmann probability that secondary structures di er by k base pairs from an arbitrary reference structure of a given RNA sequence. The algorithm, which runs in quartic time O(n4) and quadratic space O(n2), is used to determine the correlation between kinetic folding speed and the ruggedness of the energy landscape,and to predict the location of riboswitch expression platform candidates. The full paper appears in PLoS ONE (2012) 19 Dec 2012. A web server is available at http://bioinformatics.bc.edu/clotelab/FFTbor/.
• Y. Zhou, Y. Ponty, S. Vialette, J. Waldispühl, Y. Zhang, and A. Denise
Proceedings of ACM-BCB - ACM Conference on Bioinformatics, Computational Biology and Biomedical Informatics - 2013, United States, 2013 [hal]
Résumé (cliquer pour lire). The problem of RNA secondary structure design (also called inverse folding) is the following: given a target secondary structure, one aims to create a sequence that folds into, or is compatible with, a given structure. In several practical applications in biology, additional constraints must be taken into account, such as the presence/absence of regulatory motifs, either at a specific location or anywhere in the sequence. In this study, we investigate the design of RNA sequences from their targeted secondary structure, given these additional sequence constraints. To this purpose, we develop a general framework based on concepts of language theory, namely context-free grammars and finite automata. We efficiently combine a comprehensive set of constraints into a unifying context-free grammar of moderate size. From there, we use generic generic algorithms to perform a (weighted) random generation, or an exhaustive enumeration, of candidate sequences. The resulting method, whose complexity scales linearly with the length of the RNA, was implemented as a standalone program. The resulting software was embedded into a publicly available dedicated web server. The applicability demonstrated of the method on a concrete case study dedicated to Exon Splicing Enhancers, in which our approach was successfully used in the design of \emph{in vitro} experiments.
• Chapitres d'ouvrages et éditoriaux
• S. Schirmer, Y. Ponty, and R. Giegerich
Introduction to RNA secondary structure comparison.
RNA Sequence, Structure, and Function: Computational and Bioinformatic Methods, Methods in Molecular Biology -Clifton then Totowa- Editor(s) Gorodkin, Jan and Ruzzo, Walter L., Methods in molecular biology, Humana Press (Springer Imprint), 247-73 pp, 2013 [doi] [hal] [pubmed]
Résumé (cliquer pour lire). Many methods have been proposed for RNA secondary structure comparison, and new ones are still being developed. In this chapter, we first consider structure representations and discuss their suitability for structure comparison. Then, we take a look at the more commonly used methods, restricting ourselves to structures without pseudo-knots. For comparing structures of the same sequence, we study base pair distances. For structures of different sequences (and of different length), we study variants of the tree edit model. We name some of the available tools and give pointers to the literature. We end with a short review on comparing structures with pseudo-knots as an unsolved problem and topic of active research.
• 2012
• Revues avec comité de lecture
• P. Clote, Y. Ponty, and J.-M. Steyaert
Journal of Mathematical Biology, Springer Verlag (Germany), 65(3):581-99 pp, 2012 [doi] [hal] [pubmed]
Résumé (cliquer pour lire). In "The ends of a large RNA molecule are necessarily close", Yoffe et al. (Nucleic Acids Res 39(1):292-299, 2011) used the programs RNAfold [resp. RNAsubopt] from Vienna RNA Package to calculate the distance between 5' and 3' ends of the minimum free energy secondary structure [resp. thermal equilibrium structures] of viral and random RNA sequences. Here, the 5'-3' distance is defined to be the length of the shortest path from 5' node to 3' node in the undirected graph, whose edge set consists of edges {i, i + 1} corresponding to covalent backbone bonds and of edges {i, j} corresponding to canonical base pairs. From repeated simulations and using a heuristic theoretical argument, Yoffe et al. conclude that the 5'-3' distance is less than a fixed constant, independent of RNA sequence length. In this paper, we provide a rigorous, mathematical framework to study the expected distance from 5' to 3' ends of an RNA sequence. We present recurrence relations that precisely define the expected distance from 5' to 3' ends of an RNA sequence, both for the Turner nearest neighbor energy model, as well as for a simple homopolymer model first defined by Stein and Waterman. We implement dynamic programming algorithms to compute (rather than approximate by repeated application of Vienna RNA Package) the expected distance between 5' and 3' ends of a given RNA sequence, with respect to the Turner energy model. Using methods of analytical combinatorics, that depend on complex analysis, we prove that the asymptotic expected 5'-3' distance of length n homopolymers is approximately equal to the constant 5.47211, while the asymptotic distance is 6.771096 if hairpins have a minimum of 3 unpaired bases and the probability that any two positions can form a base pair is 1/4. Finally, we analyze the 5'-3' distance for secondary structures from the STRAND database, and conclude that the 5'-3' distance is correlated with RNA sequence length.
• E. Senter, S. Sheikh, I. Dotu, Y. Ponty, and P. Clote
Using the Fast Fourier Transform to Accelerate the Computational Search for RNA Conformational Switches *.
PLoS ONE, Public Library of Science, 7(12):e50506 pp, 2012 [doi] [hal]
Résumé (cliquer pour lire). Using complex roots of unity and the Fast Fourier Transform, we design a new thermodynamics-based algorithm, FFTbor, that computes the Boltzmann probability that secondary structures differ by k base pairs from an arbitrary initial structure of a given RNA sequence. The algorithm, which runs in quartic time O(n^4) and quadratic space O(n^2), is used to determine the correlation between kinetic folding speed and the ruggedness of the energy landscape, and to predict the location of riboswitch expression platform candidates. A web server is available at http://bioinformatics.bc.edu/clotelab/FFTbor/
• A. Levin, M. Lis, Y. Ponty, C. O'Donnell, S. Devadas, B. Berger, and J. Waldispühl
A global sampling approach to designing and reengineering RNA secondary structures.
Nucleic Acids Research, Oxford University Press, 40(20):10041-52 pp, 2012 [doi] [hal] [pubmed]
Résumé (cliquer pour lire). The development of algorithms for designing artificial RNA sequences that fold into specific secondary structures has many potential biomedical and synthetic biology applications. To date, this problem remains computationally difficult, and current strategies to address it resort to heuristics and stochastic search techniques. The most popular methods consist of two steps: First a random seed sequence is generated; next, this seed is progressively modified (i.e. mutated) to adopt the desired folding properties. Although computationally inexpensive, this approach raises several questions such as (i) the influence of the seed; and (ii) the efficiency of single-path directed searches that may be affected by energy barriers in the mutational landscape. In this article, we present RNA-ensign, a novel paradigm for RNA design. Instead of taking a progressive adaptive walk driven by local search criteria, we use an efficient global sampling algorithm to examine large regions of the mutational landscape under structural and thermodynamical constraints until a solution is found. When considering the influence of the seeds and the target secondary structures, our results show that, compared to single-path directed searches, our approach is more robust, succeeds more often and generates more thermodynamically stable sequences. An ensemble approach to RNA design is thus well worth pursuing as a complement to existing approaches. RNA-ensign is available at http://csb.cs.mcgill.ca/RNAensign.
• Conférences avec comité de lecture et actes
• J. Du Boisberranger, D. Gardy, and Y. Ponty
Proceedings of AOFA - 23rd International Meeting on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms - 2012, Canada, DMTCS, DMTCS Proceedings vol. AQ, 23rd Intern. Meeting on Probabilistic, Combinatorial, and Asymptotic Methods for the Analysis of Algorithms (AofA'12):243--264 pp, 2012 [hal]
Résumé (cliquer pour lire). We consider the word collector problem, i.e. the expected number of calls to a random weighted generator before all the words of a given length in a language are generated. The originality of this instance of the non-uniform coupon collector lies in the, potentially large, multiplicity of the words/coupons of a given probability/composition. We obtain a general theorem that gives an asymptotic equivalent for the expected waiting time of a general version of the Coupon Collector. This theorem is especially well-suited for classes of coupons featuring high multiplicities. Its application to a given language essentially necessitates knowledge on the number of words of a given composition/probability. We illustrate the application of our theorem, in a step-by-step fashion, on four exemplary languages, whose analyses reveal a large diversity of asymptotic waiting times, generally expressible as $\kappa \cdot m^p \cdot (\log{m})^q \cdot (\log \log{m})^r$, for $m$ the number of words, and $p, q, r$ some positive real numbers.
• S. Sheikh, R. Backofen, and Y. Ponty
Proceedings of CPM - 23rd Annual Symposium on Combinatorial Pattern Matching - 2012, Finland, Combinatorial Pattern Matching, Springer, 7354:321--333 pp, 2012 [doi] [hal]
Résumé (cliquer pour lire). La prédiction du repliement, avec pseudonoeuds généraux, d'une séquence d'ARN de taille $n$ est équivalent à la recherche d'un couplage d'énergie libre minimale. Dans un modèle d'énergie simple, où chaque paire de base contribue indépendamment à l'énergie, ce problème peut être résolu en temps $\Theta(n^3)$ grâce à une variante d'un algorithme de couplage pondéré maximal. Cependant, le même problème a été démontré NP-difficile dans le modèle d'énergie dit des plus proches voisins. Dans ce travail, nous étudions les propriétés du problème sous un modèle d'empilements, constituant un modèle intermédiaire entre ceux d'appariement et des plus proches voisins. Nous démontrons tout d'abord que le repliement avec pseudo-noeuds de l'ARN reste NP-difficile dans de nombreuses valuations du modèle d'énergie. . Par ailleurs, nous montrons que ce problème est approximable, en proposant un algorithme polynomial garantissant une $1/5$-approximation. Ce résultat illustre une différence essentielle entre ce modèle et celui des plus proches voisins, pour lequel nous montrons qu'il ne peut être approché à aucun ratio positif par un algorithme en temps polynomial sauf si $N=NP$.
• C. Banderier, O. Bodini, Y. Ponty, and H. Tafat
Proceedings of ANALCO - 12th Meeting on Analytic Algorithmics and Combinatorics - 2012, Japan, Omnipress, 107--116 pp, 2012 [hal]
Résumé (cliquer pour lire). It is well known that, under some aperiodicity and irreducibility conditions, the number of occurrences of local patterns within a Markov chain (and, more generally, within the languages generated by weighted regular expressions/automata) follows a Gaussian distribu- tion with both variance and mean in (n). By contrast, when these conditions no longer hold, it has been denoted that the limiting distribution may follow a whole diversity of distributions, including the uniform, power-law or even multimodal distribution, arising as tradeo s between structural properties of the regular expression and the weight/probabilities associated with its transitions/letters. However these cases only partially cover the full diversity of behaviors induced within regular expressions, and a characterization of attainable distributions remained to be provided. In this article, we constructively show that the limiting distribution of the simplest foresee- able motif (a single letter!) may already follow an arbitrarily complex continuous distribution (or cadlag process). We also give applications in random generation (Boltzmann sampling) and bioinformatics (parsimonious segmentation of DNA).
• P. Rinaudo, Y. Ponty, D. Barth, and A. Denise
Proceedings of WABI - 12th Workshop on Algorithms in Bioinformatics - 2012, Slovenia, tba, 2012 [hal]
Résumé (cliquer pour lire). Nous présentons un cadre général pour la comparaison structure/séquence d'une très large classe de structures d'ARN, qui généralise de nombreux travaux récents consacrés à des familles spécifiques. Notre approche, basée sur une décomposition arborescente des structures, permet d'obtenir un algorithme de complexité paramétrée, dont la complexité asymptotique, exponentielle dans le cas général, dépend de la famille de structures considérée. Pour chacune des familles considérées par les travaux antérieurs, notre algorithme se spécialise automatiquement, et permet d'obtenir une complexité égale à celle obtenue jusqu'alors.
• Chapitres d'ouvrages et éditoriaux
• Y. Ponty, and J. Jongwane
Mieux comprendre certaines molécules biologiques grâce à l’informatique
Interstices INRIA, 2012 [hal]
Résumé (cliquer pour lire). Yann Ponty s’intéresse à l’algorithmique du repliement d’ARN. Il nous en parle dans cet épisode du podcast audio.
• Conférences sans comité de lecture ou sans actes
• S. Janssen, L. Paulevé, Y. Ponty, B. Raman, and M. Zytnicki
Can Probabilistic Model Checking Explore Ribo-Nucleic Acid Folding Space?
Proceedings of IWBDA - 4th International Workshop on Bio-Design Automation - 2012, United States, 2012 [hal]
Résumé (cliquer pour lire). The folding path of a ribo-nucleic acid from its free form to its complete structure is of significant interest to structural biologists. There are algorithms for simulating kinetics of the ribo-nucleic acid from its unfolded state to its stable structure (minimum free-energy structure). These computational techniques are expensive as the total number of structures in the folding space is exponentially large. We propose to use time-efficient algorithms from the field of probabilistic model checking for verifying certain hypothesis concerning ribo-nucleic acid folding landscape. First, we explain how thermodynamic models of ribo-nucleic acid can be used to generate a Markov chain of structures, whose transitions within the chain are based on the energy difference w.r.t to their neighbours. Then we present a process algebra model to compactly represent the dynamics of ribo-nucleic acid folding. Both these approaches essentially generate input models for probabilistic model checking. Finally, we discuss if statistical model checking techniques can be applied for the problem of aligning ribo-nucleic acid sequences and structures.
• Exposés invités
• Y. Ponty
Visualizing 2D & 3D Structures of RNA
Proceedings of VIZBI - 3rd international meeting on Visualizing Biological Data - 2012, Germany, 2012 [hal]
Résumé (cliquer pour lire). Broad overview of the existing visualization techniques/methods/tools and many challenges faced by the RNA community.
• Divers & Soumis
• Y. Ponty
[hal]
Résumé (cliquer pour lire). Two formalisms, both based on context-free grammars, have recently been proposed as a basis for a non-uniform random generation of combinatorial objects. The former, introduced by Denise et al, associates weights with letters, while the latter, recently explored by Weinberg et al in the context of random generation, associates weights to transitions. In this short note, we use a simple modification of the Greibach Normal Form transformation algorithm, due to Blum and Koch, to show the equivalent expressivities, in term of their induced distributions, of these two formalisms.
• 2011
• Revues avec comité de lecture
• J. Waldispühl, and Y. Ponty
An unbiased adaptive sampling algorithm for the exploration of RNA mutational landscapes under evolutionary pressure. *.
Journal of Computational Biology, Mary Ann Liebert, 18(11):1465-79 pp, 2011 [doi] [hal] [pubmed]
Résumé (cliquer pour lire). The analysis of the relationship between sequences and structures (i.e., how mutations affect structures and reciprocally how structures influence mutations) is essential to decipher the principles driving molecular evolution, to infer the origins of genetic diseases, and to develop bioengineering applications such as the design of artificial molecules. Because their structures can be predicted from the sequence data only, RNA molecules provide a good framework to study this sequence-structure relationship. We recently introduced a suite of algorithms called RNAmutants which allows a complete exploration of RNA sequence-structure maps in polynomial time and space. Formally, RNAmutants takes an input sequence (or seed) to compute the Boltzmann-weighted ensembles of mutants with exactly k mutations, and sample mutations from these ensembles. However, this approach suffers from major limitations. Indeed, since the Boltzmann probabilities of the mutations depend of the free energy of the structures, RNAmutants has difficulties to sample mutant sequences with low G+C-contents. In this article, we introduce an unbiased adaptive sampling algorithm that enables RNAmutants to sample regions of the mutational landscape poorly covered by classical algorithms. We applied these methods to sample mutations with low G+C-contents. These adaptive sampling techniques can be easily adapted to explore other regions of the sequence and structural landscapes which are difficult to sample. Importantly, these algorithms come at a minimal computational cost. We demonstrate the insights offered by these techniques on studies of complete RNA sequence structures maps of sizes up to 40 nucleotides. Our results indicate that the G+C-content has a strong influence on the size and shape of the evolutionary accessible sequence and structural spaces. In particular, we show that low G+C-contents favor the apparition of internal loops and thus possibly the synthesis of tertiary structure motifs. On the other hand, high G+C-contents significantly reduce the size of the evolutionary accessible mutational landscapes.
• Conférences avec comité de lecture et actes
• J. Waldispühl, and Y. Ponty
Proceedings of RECOMB - 15th Annual International Conference on Research in Computational Molecular Biology - 2011, Canada, Lecture Notes in Computer Science, Springer Berlin / Heidelberg, 6577:501-515 pp, 2011 [doi] [hal]
Résumé (cliquer pour lire). The analysis of the relationship between sequences and structures (i.e. how mutations aect structures and reciprocally how structures in uence mutations) is essential to decipher the principles driving molecular evolution, to infer the origins of genetic diseases or to develop bioengineering applications such as the design of articial molecules. Because their structures can be predicted from the sequence data only, RNA molecules provide a good framework to study this sequence-structure relationship. We recently introduced a suite of algorithms called RNAmutants which allows, for the rst time, a complete exploration of RNA sequence-structure maps in polynomial time and space. Formally, RNAmutants takes an input sequence (or seed) to compute the Boltzmann weighted ensembles of mutants with exactly k mutations, and sample mutations from these ensembles. However, this approach suers from major limitations. Indeed, since the Boltzmann probabilities of the mutations depend of the free energy of the structures, RNAmutants has diculties to sample mutant sequences with low G+C-contents. In this paper we introduce a novel unbiased adaptive sampling algorithm that enables RNAmutants to sample regions of the mutational landscape poorly covered by classical algorithms. We applied these methods to sample mutations with low G+C-contents. These adaptive sampling techniques can be easily adapted to explore other regions of the sequence and structural landscapes which are dif- cult to sample. Importantly, these algorithms come at a minimal computational cost. We demonstrate the insights oered by these techniques on studies of complete RNA sequence structures maps of sizes up to 40 nucleotides. Our results indicate that the G+C-content has a strong in uence on the size and shape of the evolutionary accessible sequence and structural spaces. In particular, we show that low G+C-contents favor the apparition of internal loops and thus possibly the synthesis of tertiary structure motifs. On the other hand, high G+C-contents signicantly reduce the size of the evolutionary accessible mutational landscapes.
• Y. Ponty, and C. Saule
Proceedings of WABI - 11th Workshop on Algorithms in Bioinformatics - 2011, Germany, 2011 [hal]
Résumé (cliquer pour lire). We extend an hypergraph representation, introduced by Finkelstein and Roytberg, to unify dynamic programming algorithms in the context of RNA folding with pseudoknots. Classic applications of RNA dynamic programming energy minimization, partition function, base-pair probabilities\ldots) are reformulated within this framework, giving rise to very simple algorithms. This reformulation allows one to conceptually detach the conformation space/energy model -- captured by the hypergraph model -- from the specific application, assuming unambiguity of the decomposition. To ensure the latter property, we propose a new combinatorial methodology based on generating functions. We extend the set of generic applications by proposing an exact algorithm for extracting generalized moments in weighted distribution, generalizing a prior contribution by Miklos and al. Finally, we illustrate our full-fledged programme on three exemplary conformation spaces (secondary structures, Akutsu's simple type pseudoknots and kissing hairpins). This readily gives sets of algorithms that are either novel or have complexity comparable to classic implementations for minimization and Boltzmann ensemble applications of dynamic programming.
• Conférences sans comité de lecture ou sans actes
• S. Janssen, Y. Ponty, B. Raman, S. Sheikh, J.-M. Steyaert, and P. Clote
Proceedings of Fifth Indo-French Bioinformatics Meeting, India, 2011 [hal]
Résumé (cliquer pour lire). In this short note, we investigate the responsibility of pseudoknotted (PK) foldings in the observed incapacity of an MFE folding method (RNAfold) to predict the structure of certain families of RNA within RFam. We considered the difference Delta in predicted free-energy obtained by respectively considering and omitting PK conformations, which we showed could be reliably used as an indicator for the presence of pseudoknots (84.5% AUC, while 88.3% of PK families have positive values). We furthered our analysis, investigating the top 15 families associated with positive values of Delta associated with non-PK consensus structures in RFAM, and found evidence of the presence of PK in the literature for at least 11 of them. However a large proportion of poorly predicted families remain associated with low Delta values, and additional explanations need be explored for their poor predictions.
• 2010
• Revues avec comité de lecture
• A. Denise, Y. Ponty, and M. Termier
Theoretical Computer Science, Elsevier, 411(40-42):3527-3552 pp, 2010 [doi] [hal]
Résumé (cliquer pour lire). Consider a class of decomposable combinatorial structures, using different types of atoms ${\cal Z} = \{z_1,\ldots ,z_{|{\cal Z}|}\}$. We address the random generation of such structures with respect to a size $n$ and a targeted distribution in $k$ of its \emph{distinguished} atoms. We consider two variations on this problem. In the first alternative, the targeted distribution is given by $k$ real numbers $\mu_1, \ldots, \mu_k$ such that $0 < \mu_i < 1$ for all $i$ and $\mu_1+\cdots+\mu_k \leq 1$. We aim to generate random structures among the whole set of structures of a given size $n$, in such a way that the {\em expected} frequency of any distinguished atom $z_i$ equals $\mu_i$. We address this problem by weighting the atoms with a $k$-tuple $\pi$ of real-valued weights, inducing a weighted distribution over the set of structures of size $n$. We first adapt the classical recursive random generation scheme into an algorithm taking ${\cal O}(n^{1+o(1)}+mn\log{n})$ arithmetic operations to draw $m$ structures from the $\pi$-weighted distribution. Secondly, we address the analytical computation of weights such that the targeted frequencies are achieved asymptotically, i. e. for large values of $n$. We derive systems of functional equations whose resolution gives an explicit relationship between $\pi$ and $\mu_1, \ldots, \mu_k$. Lastly, we give an algorithm in ${\cal O}(k n^4)$ for the inverse problem, {\it i.e.} computing the frequencies associated with a given $k$-tuple $\pi$ of weights, and an optimized version in ${\cal O}(k n^2)$ in the case of context-free languages. This allows for a heuristic resolution of the weights/frequencies relationship suitable for complex specifications. In the second alternative, the targeted distribution is given by a $k$ natural numbers $n_1, \ldots, n_k$ such that $n_1+\cdots+n_k+r=n$ where $r \geq 0$ is the number of undistinguished atoms. The structures must be generated uniformly among the set of structures of size $n$ that contain {\em exactly} $n_i$ atoms $z_i$ ($1 \leq i \leq k$). We give a ${\cal O}(r^2\prod_{i=1}^k n_i^2 +m n k \log n)$ algorithm for generating $m$ structures, which simplifies into a ${\cal O}(r\prod_{i=1}^k n_i +m n)$ for regular specifications.
• Conférences avec comité de lecture et actes
• D. Gardy, and Y. Ponty
Proceedings of GASCOM - 8th conference on random generation of combinatorial structures - 2010, Canada, 14pp pp, 2010 [hal]
Résumé (cliquer pour lire). The present work analyzes the redundancy of sets of combinatorial objects produced by a weighted random generation algorithm proposed by Denise et al. This scheme associates weights to the terminals symbols of a weighted context-free grammar, extends this weight definition multiplicatively on words, and draws words of length $n$ with probability proportional their weight. We investigate the level of redundancy within a sample of $k$ word, the proportion of the total probability covered by $k$ words (coverage), the time (number of generations) of the first collision, and the time of the full collection. For these four questions, we use an analytic urn analogy to derive asymptotic estimates and/or polynomially computable exact forms. We illustrate these tools by an analysis of an RNA secondary structure statistical sampling algorithm introduced by Ding et al.
• O. Bodini, and Y. Ponty
Proceedings of 21st International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods in the Analysis of Algorithms (AofA'10), Austria, DMTCS Proceedings, Discrete Mathematics and Theoretical Computer Science, DMTCS Proceedings vol. AM, 21st International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods in the Analysis of Algorithms (AofA'10):49-64 pp, 2010 [hal]
Résumé (cliquer pour lire). We address the uniform random generation of words from a context-free language (over an alphabet of size $k$), while constraining every letter to a targeted frequency of occurrence. Our approach consists in a multidimensional extension of Boltzmann samplers. We show that, under mostly $\textit{strong-connectivity}$ hypotheses, our samplers return a word of size in $[(1- \epsilon)n, (1+ \epsilon)n]$ and exact frequency in $\mathcal{O}(n^{1+k/2})$ expected time. Moreover, if we accept tolerance intervals of width in $\Omega (\sqrt{n})$ for the number of occurrences of each letters, our samplers perform an approximate-size generation of words in expected $\mathcal{O}(n)$ time. We illustrate our approach on the generation of Tetris tessellations with uniform statistics in the different types of tetraminoes.
• 2009
• Revues avec comité de lecture
• K. Darty, A. Denise, and Y. Ponty
Bioinformatics, Oxford University Press (OUP), 25(15):1974-5 pp, 2009 [doi] [hal] [pubmed]
Résumé (cliquer pour lire). DESCRIPTION: VARNA is a tool for the automated drawing, visualization and annotation of the secondary structure of RNA, designed as a companion software for web servers and databases. FEATURES: VARNA implements four drawing algorithms, supports input/output using the classic formats dbn, ct, bpseq and RNAML and exports the drawing as five picture formats, either pixel-based (JPEG, PNG) or vector-based (SVG, EPS and XFIG). It also allows manual modification and structural annotation of the resulting drawing using either an interactive point and click approach, within a web server or through command-line arguments. AVAILABILITY: VARNA is a free software, released under the terms of the GPLv3.0 license and available at http://varna.lri.fr. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
• 2008
• Revues avec comité de lecture
• Y. Ponty
Efficient sampling of RNA secondary structures from the Boltzmann ensemble of low-energy: The boustrophedon method
Journal of Mathematical Biology, Springer Verlag (Germany), 56(1-2):107--127 pp, 2008 [doi] [hal]
Résumé (cliquer pour lire). We adapt here a surprising technique, the boustrophedon method, to speed up the sampling of RNA secondary structures from the Boltzmann low-energy ensemble. This technique is simple and its implementation straight-forward, as it only requires a permutation in the order of some operations already performed in the stochastic traceback stage of these algorithms. It nevertheless greatly improves their worst-case complexity from O(n^2) to O(n log(n)), for n the size of the original sequence. Moreover the average-case complexity of the generation is shown to be improved from O(n√n) to O(n log(n)) in an Boltzmann-weighted homopolymer model based on the Nussinov–Jacobson free-energy model. These results are extended to the more realistic Turner free-energy model through experiments performed on both structured (Drosophilia melanogaster mRNA 5S) and hybrid (Staphylococcus aureus RNAIII) RNA sequences, using a boustrophedon modified version of the popular software UnaFold. This improvement allows for the sampling of greater and more significant sets of structures in a given time.
• M. Bousquet-Mélou, and Y. Ponty
Discrete Mathematics and Theoretical Computer Science, DMTCS, 10(2):125--152 pp, 2008 [hal]
Résumé (cliquer pour lire). Let a and b be two positive integers. A culminating path is a path of Z^2 that starts from (0,0), consists of steps (1,a) and (1,-b), stays above the x-axis and ends at the highest ordinate it ever reaches. These paths were first encountered in bioinformatics, in the analysis of similarity search algorithms. They are also related to certain models of Lorentzian gravity in theoretical physics. We first show that the language on a two letter alphabet that naturally encodes culminating paths is not context-free. Then, we focus on the enumeration of culminating paths. A step by step approach, combined with the kernel method, provides a closed form expression for the generating fucntion of culminating paths ending at a (generic) height k. In the case a=b, we derive from this expression the asymptotic behaviour of the number of culminating paths of length n. When a>b, we obtain the asymptotic behaviour by a simpler argument. When a= b, with no precomputation stage nor non-linear storage required. The choice of the best algorithm is not as clear when a
• W. Lorenz, P. Clote, and Y. Ponty
Asymptotics of RNA shapes
Journal of Computational Biology, Mary Ann Liebert, 15(1):31--63 pp, 2008 [doi] [hal]
Résumé (cliquer pour lire). RNA shapes, introduced by Giegerich et al., provide a useful classification of the branching complexity for RNA secondary structures. In this paper, we derive an exact value for the asymptotic number of RNA shapes, by relying on an elegant relation between non-ambiguous, context-free grammars and generating functions. Our results provide a theoretical upper bound on the length of RNA sequences amenable to probabilistic shape analysis, under the assumption that any base can basepair with any other base. Since the relation between context-free grammars and asymptotic enumeration is simple yet not well-known in bioinformatics, we give a self-contained presentation with illustrative examples. Additionally, we prove a surprising 1-to-1 correspondence between Pi-shapes and Motzkin numbers
• Y. Ponty, R. Istrate, E. Porcelli, and P. Clote
LocalMove: computing on-lattice fits for biopolymers
Nucleic Acids Research, Oxford University Press, 36(2):W216-W222 pp, 2008 [doi] [hal]
Résumé (cliquer pour lire). Given an input Protein Data Bank file (PDB) for a protein or RNA molecule, LocalMove is a web server that determines an on-lattice representation for the input biomolecule. The web server implements a Markov Chain Monte-Carlo algorithm with simulated annealing to compute an approximate fit for either the coarse-grain model or backbone model on either the cubic or face-centered cubic lattice. LocalMove returns a PDB file as output, as well as dynamic movie of 3D images of intermediate conformations during the computation. The LocalMove server is publicly available at http://bioinformatics.bc.edu/clotelab/localmove/.
• Conférences avec comité de lecture et actes
• Y. Ponty
Proceedings of GASCOM - 7th conference on random generation of combinatorial structures - 2008, Italy, 12pp pp, 2008 [hal]
Résumé (cliquer pour lire). We address the non-redundant random generation of k words of length n from a context-free language. Additionally, we want to avoid a prede¯ned set of words. We study the limits of a rejection-based approach, whose time complexity is shown to grow exponentially in k in some cases. We propose an alternative recursive algorithm, whose careful implementation allows for a non-redundant generation of k words of size n in O(kn log n) arithmetic operations after the precomputation of O(n) numbers. The overall complexity is therefore dominated by the generation of k words, and the non-redundancy comes at a negligible cost.
• 2007
• Revues avec comité de lecture
• F. Fabrizzio, Y. Ponty, W. Lorenz, and P. Clote
DIAL: A web server for the pairwise alignment of two RNA 3-dimensional structures using nucleotide, dihedral angle and base pairing similarities
Nucleic Acids Research, Oxford University Press, 35(Web server issue):W659--668 pp, 2007 [doi] [hal]
Résumé (cliquer pour lire). DIAL (dihedral alignment) is a web server that provides public access to a new dynamic programming algorithm for pairwise 3D structural alignment of RNA. DIAL achieves quadratic time by performing an alignment that accounts for (i) pseudo-dihedral and/or dihedral angle similarity, (ii) nucleotide sequence similarity and (iii) nucleotide base-pairing similarity. DIAL provides access to three alignment algorithms: global (Needleman–Wunsch), local (Smith–Waterman) and semiglobal modified to yield motif search). Suboptimal alignments are optionally returned, and also the Boltzmann pair probabilities for aligned positions within the optimal alignment. If a non-zero suboptimal alignment score ratio is entered, then the semiglobal alignment algorithm may be used to detect structurally similar occurrences of a user-specified 3D motif. The query motif may be contiguous in the linear chain or fragmented in a number of noncontiguous regions. The DIAL web server provides graphical output which allows the user to view, rotate and enlarge the 3D superposition for the optimal (and suboptimal) alignment of query to target. Although graphical output is available for all three algorithms, the semiglobal motif search may be of most interest in attempts to identify RNA motifs. DIAL is available at http://bioinformatics.bc.edu/clotelab/DIAL.
• 2006
• Revues avec comité de lecture
• Y. Ponty, M. Termier, and A. Denise
Bioinformatics, Oxford University Press (OUP), 22(12):1534--1535 pp, 2006 [doi] [hal]
Résumé (cliquer pour lire). GenRGenS is a software tool dedicated to randomly generating genomic sequences and structures. It handles several classes of models useful for sequence analysis, such as Markov chains, hidden Markov models, weighted context-free grammars, regular expressions and PROSITE expressions. GenRGenS is the only program that can handle weighted context-free grammars, thus allowing the user to model and to generate structured objects (such as RNA secondary structures) of any given desired size. GenRGenS also allows the user to combine several of these different models at the same time. Availability: Source and executable files of GenRGenS (in Java) and the complete user's manual are freely available at http://www.lri.fr/bio/GenRGenS
• Thèses
• Y. Ponty
2006 [hal]
Résumé (cliquer pour lire). La mise en évidence des mécanismes de sélection agissant sur les données génomiques structurées (ARN, Protéines, ADN...) nécessite l'élaboration de modèles de séquences. Une fois un tel modèle élaboré, il est possible, au prix d'une analyse mathématique parfois complexe ou par le biais de la
génération aléatoire, d'évaluer la significativité d'un phénomène observé. Tout d'abord, nous nous intéressons aux propriétés des grammaires pondérées, un formalisme particulièrement adapté à la modélisation de la structure des ARN, dérivant des algorithmes de génération aléatoire efficaces implémentés au sein du prototype GenRGenS. Nous abordons le calcul automatique des pondérations réalisant des valeurs observées pour les paramètres du modèle, ainsi qu'une implémentation basée sur une approche optimisation. Dans un second temps, nous abordons la modélisation de la structure secondaire d'ARN. Après quelques rappels de biologie moléculaire, nous proposons plusieurs modèles basés sur des grammaires pondérées permettant la génération de structures d'ARN réalistes. L'utilisation d'un algorithme d'optimisation permet le calculer des pondérations correspondant à certaines familles d'ARN. Nous proposons enfin un algorithme d'extraction de structure secondaire maximale dans une structure générale, qui permet de profiter des données récentes issues de la cristallographie. Le dernier chapitre de cette thèse s'intéresse à l'analyse d'un algorithme de recherche de similarité heuristique, dont la sensibilité s'avère étroitement liée à la probabilité de présence d'un motif au sein de marches aléatoires particulières, les chemins culminants. Ces marches restent positives, et atteignent une altitude maximale en leur dernier pas. Nous proposons un algorithme récursif de génération aléatoire pour ces chemins. En combinant des techniques issues de la combinatoire énumérative, l'analyse asymptotique et la théorie des langages, nous dérivons des algorithmes de génération aléatoire par rejet linéaires dans de nombreux cas.
• 2004
• Conférences avec comité de lecture et actes
• G. Kucherov, L. Noé, and Y. Ponty
Proceedings of 4th Symposium on Bioinformatics and bioengineering - BIBE'2004, Taiwan, Proceedings of the Fourth IEEE Symposium on Bioinformatics and Bioengineering (BIBE'04), IEEE Computer Society, 387-394 pp, 2004 [doi] [hal]
Résumé (cliquer pour lire). We address the problem of estimating the sensitivity of seed-based similarity search algorithms. In contrast to approaches based on Markov models [18, 6, 3, 4, 10], we study the estimation based on homogeneous alignments. We describe an algorithm for counting and random generation of those alignments and an algorithm for exact computation of the sensitivity for a broad class of seed strategies. We provide experimental results demonstrating a bias introduced by ignoring the homogeneousness condition.
• 2003
• Divers & Soumis
• Y. Ponty
[hal]
Résumé (cliquer pour lire). Ce mémoire résume un travail opéré sous la direction d'Alain Denise de Mars à Juin 2003. Il consisteen une étude des propriétés combinatoires des structures secondaires d'ARN, en rapport avec le problème de la génération aléatoire de structures secondaires d'ARN réalistes.Dans une première partie, nous tenterons de familiariser le lecteur avec le contexte biologique sous-jacent aux problèmes étudiés.Nous présenterons ensuite un état de l'art de l'étude combinatoire des structures secondaire d'ARN. Pour cela, nous introduirons brièvement une série d'outils théoriques, comme les séries génératrices, et leurs comportements asymptotiques ou les grammaires non contextuelles. Enfin, nous évoquerons divers mécanismes de génération aléatoire et présenterons une contribution à l'étude des paramètres des structures secondaires d'ARN.Nous proposerons en annexe une description d'un logiciel dédié à la génération aléatoire de séquences génomiques, GenRGenS, dont l'auteur assure le développement depuis la version 1.0. Nous présenterons aussi un algorithme polynomial de génération aléatoire de chemins culminants, structures combinatoires non algébriques qui apparaissent lors de l'étude d'algorithmes d'alignements de séquence.
• G. Kucherov, L. Noé, and Y. Ponty
[hal]
Résumé (cliquer pour lire). We address the problem of measuring the sensitivity of seed-based similarity search algorithms. In contrast to approaches based on Markov models, we study the measurement based on homogeneous alignments. We describe an algorithm for counting and random generation of those alignments and an algorithm for exact computation of the sensitivity for a broad class of seed strategies. We provide experimental results demonstrating a bias introduced by ignoring the homogeneousness condition.
Copyleft Yann Ponty 2015