Yann Ponty

CNRS research Director · Head of AMIBio team
Faculty member of LIX (Comp. Sci. Dept), École Polytechnique
Office 2005 · LIX/Bat. Turing · 1 rue Estienne d'Orves · 91120 Palaiseau · France
+33 1 77 57 80 95 · No Javascript = no email
I am a tenured CNRS research scientist, based at the Computer Science Department (LIX) of Ecole Polytechnique (Institut Polytechnique de Paris, France). I design Bioinformatics methods, usually relying strongly on discrete algorithmic techniques, to expand our common knowledge of Ribo-Nucleic Acids (RNA) and exploit their potential in biotechnology.
My research interests include:
  • RNA folding, design, and evolution
  • RNA/RNA, RNA/Proteins, and Proteins/Proteins interactions
  • Random generation and enumerative combinatorics
  • Discrete Algorithms (Dynamic programming!)
  • RNA visualization
News

Bertrand Marchand defended his PhD (Oct 2023), soon to be followed by Taher Yacoub (no pressure... ;))

Théo Boury and Alain Azede embark on their PhD journey in the Fall of 2023

I will be chairing the proceedings tracks of ISMB 2024 with Prof. Tijana Milenkovic (Univ. Notre Dame, USA)

Leading (2022-2025) the search committee of the SIF/Gilles Kahn prize for best French PhD in Computer Science


Research

Over the past decade, my research interests have been at the intersection of Computer Science, Mathematics and Molecular Biology. On the applied level, I mainly design analytic approaches, efficient algorithms and tools to answer key questions in Bioinformatics, with a special focus on RNA biology. These questions include, but are not limited to:

  • How to predict RNA structure in the presence of pseudoknots?
  • What is the prevalence of kinetics within the RNA folding process?
  • What is the interplay between RNA structure and evolution?
  • How can a knowledge of the structure of RNA help in the analysis of experiments?
    Conversely, how to use coarse-grain experimental data to perform an accurate RNA structure prediction?
  • How to design RNA sequences that perform predefined functions in vivo?

Some of these questions are related to universal properties of biopolymers. In such cases, they do not necessarily require considering a specific sequence, or overly sophisticated (and complex) energy models. Provided they can be accurately rephrased at such an abstract level, I typically strive to provide (asymptotical) analytic results using standard tools and techniques (generating functions, singularity analysis...) borrowed from the fields of enumerative combinatorics and analytic combinatorics.

More complicated questions may still lend themselves quite nicely to exact resolution through polynomial time/space algorithms, usually based on dynamic programming. The concepts and design principles underlying such tools can sometimes be generalized to some other application contexts in bioinformatics, such as comparative genomics.

Sometimes, the problem turns out to be computationally intractable, or provably hard in the well-defined meaning given to the term by the field of computational complexity theory. In these situations, I try to establish what makes the problem hard, and how to possibly work around the hardness result either by adopting a classic parameterized complexity approach, or by simplifying the model in order to achieve an acceptable tradeoff between expressivity and tractability.

In the (increasingly common) cases where the problem is hard to analyze, or increasingly as a first approach to test exploratory hypotheses, I tend to adopt a probabilistic perspective based on random sampling within adequately controlled distributions, such as the uniform distribution or the Boltzmann distribution.


Publications

By type
By topic
Text

This list is loaded from HAL, and formatted by bibtex-js.
In case something goes wrong: Access static version


Software

RNA Bioinformatics

VARNA

Drawing and editing the RNA secondary structure. Accepts a wide range of documented and illustrated options, and offers rich user interaction.

Collab.: A. Denise@Paris-Saclay Univ.

VARNA

RNANR

Non-Redundant sampling of RNAs secondary structures. Allows for the generation of key landmarks (locally optimal structures) of kinetics landscapes.

Collab.: H. Touzet@Univ. Lille

RNANR

SPARCS

SPARCS is a program to analyze structured and unstructured regions in coding RNA sequences. Uses a random model preserving both the amino acid sequence and the dinucleotide content, enabling a computation of accurate z-score values.

Collab.: J. Waldispühl@Univ. McGill

SPARCS

RNA Design

IncaRNAtion

Design of RNA sequences folding into a target secondary structures with nucleotide distribution. Implements a global sampling approach, which provides more relevant seed sequences.

Collab.: J. Waldispühl@Univ. McGill

IncaRNAtion

RNARedPrint

Design of RNAs with multiple conformations. Uses an FPT dynamic programming scheme to sample sequences under expressive constraints to solve the positive design of RNA.

Collab.: S. Will@TBI Vienna

RNARedPrint

IncaRNAfbInv

Fragment-based design of RNA sequences, combining a (constrained) global sampling strategy for the generation of seed sequences with a flexible local search optimization.

Collab.: J. Waldispühl@Univ. McGill · D. Barash@Ben Gurion Univ.

IncaRNAfbInv

Sequence analysis

GenRGenS

A toolkit for the random generation of sequences. Supports different classes of models, including weighted context-free grammars, Markov models, ProSITE patterns...

Collab.: A. Denise@Paris-Saclay Univ.

GenRGenS

RNAPyro

Error-correction of RNA sequencing data using secondary structure information.

Collab.: J. Waldispühl@Univ. McGill

RNAPyro

RNA Structural Bioinformatics

DIAL

A web server for the 3D alignment and motifs search of experimentally-resolved RNAs. Captures similarities in sequence content, secondary structure (using RNAView), and 3D rotameric properties (Dihedral angles).

Collab.: P. Clote@Boston College

DIAL

LocalMove

LocalMove solves the discretization (aka best on-lattice fit) for 3D macromolecules. Fits the polymer backbone to a discrete regular set of points (lattice), using a local search (MCMC) based on local moves.

Collab.: P. Clote@Boston College

LocalMove


Personal trajectory

LIX · Ecole Polytechnique · France

CNRS Research Director· Tenured
RNA Bioinformatics · Random Generation · Applied Analytic Combinatorics
October 2020 - Current date

Département d'Informatique · Université Paris-Saclay · France

Habilitation in Computer Science
RNA Bioinformatics · Dynamic Programming · Random Generation · Analytic Combinatorics
May 2020

LIX · Ecole Polytechnique · France

Head of AMIBio Team
RNA Bioinformatics · Random Generation · Applied Analytic Combinatorics
January 2016 - Current date

UMI PIMS/Maths Dept · Simon Fraser University · Canada

CNRS Researcher · Visiting scientist
Comparative Genomics · RNA Bioinformatics · Random Generation
Main local collaborations with Cédric Chauve and Marni Mishna
Septembre 2013 - Septembre 2015

LIX · Ecole Polytechnique · France

CNRS Researcher · Tenured
RNA Design · Random Generation · RNA Bioinformatics
Member of AMIBio team
November 2009 - September 2020

IRIF · Université Paris Diderot · France

ANR Postdoctoral Fellow
Random Generation · Enumerative Combinatorics · Analytic Combinatorics
Supervised by Dominique Rossin and Michèle Soria (LIP6)
November 2008 - October 2009

LIP6 · Sorbonne université · France

Decrypthon Postdoctoral Fellow
Structural Bioinformatics · Protein-Protein interactions
Supervised by Alessandra Carbone (lIP6)
April 2008 - November 2008

Biology Department · Boston College · USA

NSF Postdoctoral Fellow
RNA Bioinformatics · Structural Bioinformatics
Supervised by Peter Clote
October 2006 - April 2008

LRI · Université Paris-Saclay · France

PhD in Computer Science
Bioinformatics · Enumerative Combinatorics · Analysis and Design of Algorithms
Bioinfo team at LRI · Supervised by Alain Denise · Reviewed by Philippe Flajolet and Eric Rivals
October 2003 - October 2006

School of Computer Science · Université Paris-Saclay · France

Master of Science · DEA Algorithmique
Theoretical Computer Science · Enumerative Combinatorics · Analysis and Design of Algorithms
October 2000 - July 2003

Université Paris-Saclay · France

Bachelor of Science
Computer Science · Mathematics · Physics
October 1997 - July 2000

Main service

Some of my activities besides scientific research and teaching: I have also kept busy in the past by being:

Editorial and reviewing activities


Teaching

While my main activities are in scientific research, I regularly teach at the graduate and, less recurrently, undergraduate levels.

Université Paris-Saclay · Palaiseau, France

AMI2B Master Program · Second year (M2)
Combinatorial Optimization · RNA Bioinformatics
Since 2009
Infos/resources for the ongoing academic year:
  • Lecture 1 - Alignments [pdf]
  • Lecture 3 - Graphs and assembly [pdf]
    Lab assigment - Eulerian paths and k-mers assembly [pdf]
  • RNA structure prediction [pdf]
    Enoncé TP - RNA parsing, counting and folding
    1. Alkan C, Karakoç E, Nadeau JH, Sahinalp SC and Zhang K (2006), "RNA–RNA Interaction Prediction and Antisense RNA Target Search", Journal of Computational Biology., mar 2006. Vol. 13(2), pp. 267-282. Mary Ann Liebert Inc. DOI
    2. Chikhi R, Limasset A, Jackman S, Simpson JT and Medvedev P (2015), "On the Representation of de Bruijn Graphs", Journal of Computational Biology., may 2015. Vol. 22(5), pp. 336-352. Mary Ann Liebert Inc. DOI
    3. Dondi R, Lafond M and Scornavacca C (2019), "Reconciling multiple genes trees via segmental duplications and losses", Algorithms for Molecular Biology., mar, 2019. Vol. 14(1) Springer Science and Business Media LLC. DOI
    4. Ferragina P and Manzini G (2005), "Indexing compressed text", Journal of the ACM., Jul 2005. Vol. 52(4), pp. 552-581. Association for Computing Machinery. DOI
    5. Hammer S, Wang W, Will S and Ponty Y (2019), "Fixed-parameter tractable sampling for RNA design with multiple target structures.", BMC bioinformatics., April 2019. Vol. 20, pp. 209. DOI
    6. Hoffmann S, Otto C, Kurtz S, Sharma CM, Khaitovich P, Vogel J, Stadler PF and Hackermifmmodeuelseüfiller J (2009), "Fast mapping of short sequences with mismatches, insertions and deletions using index structures", PLoS Computational Biology., Sep 2009. Vol. 5(9), pp. e1000502. DOI
    7. Limasset A, Cazaux B, Rivals E and Peterlongo P (2016), "Read mapping on de Bruijn graphs", BMC Bioinformatics., jun 2016. Vol. 17(1) Springer Science and Business Media LLC. DOI
    8. Medvedev P, Pham S, Chaisson M, Tesler G and Pevzner P (2011), "Paired de Bruijn Graphs: A Novel Approach for Incorporating Mate Pair Information into Genome Assemblers", Journal of Computational Biology., Nov 2011. Vol. 18(11), pp. 1625-1634. Journal of Computational Biology. DOI
    9. Miklos I, Meyer I and Nagy B (2005), "Moments of the Boltzmann distribution for RNA secondary structures", Bulletin of Mathematical Biology., sep 2005. Vol. 67(5), pp. 1031-1047. Springer Science and Business Media LLC. DOI
    10. Myers G (2013), "What's Behind Blast", In Models and Algorithms for Genome Evolution. , pp. 3-15. Springer London. DOI
    11. Reidys CM, Huang FWD, Andersen JE, Penner RC, Stadler PF and Nebel ME (2011), "Topology and prediction of RNA pseudoknots", Bioinformatics., feb 2011. Vol. 27(8), pp. 1076-1085. Oxford University Press (OUP). DOI
    12. Strothmann D (2007), "The affix array data structure and its applications to RNA secondary structure analysis", Theoretical Computer Science., dec 2007. Vol. 389(1-2), pp. 278-294. Elsevier BV. DOI
    13. Xu J and Berger B (2006), "Fast and Accurate Algorithms for Protein Side-Chain Packing", J. ACM. New York, NY, USA, jul 2006. Vol. 53(4), pp. 533–557. Association for Computing Machinery. DOI

Sorbonne Universités · France

BIM Master Program · Second year (M2)
RNA Bioinformatics · Structural Bioinformatics
Since 2009
Infos/resources for the ongoing academic year:

Ecole Polytechnique · France

Engineering Program · BSc/MSc
Algorithms and Programming
2009-2015

Fun* stuffs and ramblings

*Well, fun for me, but probably an acquired taste (at best) for you

Neil Gaiman's best tips for survival in an artistic academic career

Even though there are some difference between art and the academia, it is quite striking how some of Neil Gaiman's advise to young artists mirror what I would tell my students (with at least one notable exception... :) ).

The secret to keeping fruitful collaborations [skip to quote]

You get work however you get work, but people keep working [..] because their work is good, and because they're easy to get along with, and because they deliver the work on time. And you don't even need all three... two out of three is fine!

  • People will tolerate how unpleasant you are if the work is good and you deliver it on time.
  • People will forgive the lateness of your work if it's good and they like you.
  • And you don't have to be as good as everyone else if you're on time and it's always a pleasure to hear from you.

Neil Gaiman, commencement speech at the University of the Arts Class of 2012

Of course, we should aim for the three out of three, but the observation of two being sufficient is spot on in my experience (and represents a parsimonious and flattering explanation to my continued academic employement despite my chronic tardiness :) ).

On impostor syndrom [skip to quote]

The first problem of [..] success is the unshakable conviction that you are getting away with something and that, any moment now, they will discover you.
[..] I was convinced there would be a knock on the door, and a man with a clipboard [..] would be there to tell me it was all over and they'd caught up with me and now I would have to go and get a real job, one that didn't consist of making things up and writing them down, and reading books I wanted to read...
And then I would go away quietly and get the kind of job where I would have to get up early in the morning and wear a tie and not make things up anymore.

Neil Gaiman, commencement speech at the University of the Arts Class of 2012

I guess most academics can relate to that (generally undeservedly), especially under the grind of our recurrent evaluations (institutions, employers, funding bodies, peers...). Although some of us already wear a tie to start with...

On emails [skip to quote]

There was a day when I looked up and realized that I had become someone who profesionnally replied to emails and who wrote as a hobby.

Neil Gaiman, commencement speech at the University of the Arts Class of 2012

An increasing number of my colleagues adopt quite extreme measures when it comes to email management (eg checking their emails only twice a day, or only within certain time slices...). I think it is one of the great challenge faced by humanity in the information era, and something we'll have to teach our children (academic), to find ways to limit the flow of information requiring active sprocessing (haven't found an acceptable solution myself... still looking).

On cheating on your CV (not an endorsement! :) ) [skip to quote]

People get hired because, somehow, they get hired... In my case I did something which these days would be easy to check and would get me into a lot of trouble and when I started out in those pre-internet days seemed like a sensible career strategy. When I was asked by editors who I'd written for, I lied. I listed a handful of magazines that sounded likely and I sounded confident and I got jobs. I then made it a point of honor to have written something for each of the magazines I've listed to get that first job, so that I hadn't actually lied, I'd just been chronologically challenged.

Neil Gaiman, commencement speech at the University of the Arts Class of 2012

This one I do not condone, of course, but working hard to deserve in retrospect whatever undeserved advantage you get at some point is a great attitude to adopt (cf impostor syndrom).

On luck [skip to quote]

Luck is useful. Often you'll discover that the harder you work, and the more wisely that you work, the luckier you will get but there is luck, and it helps.

Neil Gaiman, commencement speech at the University of the Arts Class of 2012

Parting words [skip to quote]

Be wise, because the world needs more wisdom, and if you cannot be wise pretend to be someone who is wise and then just behave like they would.

Neil Gaiman, commencement speech at the University of the Arts Class of 2012

If the shoe fits... (aka Python's duck typing :) ).


(Canadian) pennies for your thoughts

Background: In 2012, Canada decided to phase out the penny for its coinage system. Product prices may still use arbitrary cents (impractical otherwise since prices in Canada do not typically include taxes) but cash transactions are now rounded up/down to the closest multiple of 5 cents, as shown in the image below:

For instance, if two products A and B are worth 67¢ and 1\$21¢ respectively, one will have to pay 1\$80¢ to buy them together, instead of their accumulated price 1\$78¢. In this case, the seller makes an extra 2¢ in the transaction, thanks to the imperfect round-off. Mathematically speaking, it all depends on the remainder $r$ of the total price in a division by 5¢: If $r$ equals $1$ or $2$, than the seller looses $r$ cents, but if $r$ equals $3$ or $4$, then the seller makes an extra $5-r$ cents in the transaction.

With my wife, we were wondering: Can a supermarket manipulate the prices of its products in such a way that the rounded totals translate, on average, into an extra benefit for the company? We were not overly paranoid about it, but it seemed like a good opportunity to exercise our brain while strolling/bouldering (and, very occasionally, surfing) on the amazing beaches of Oahu's famed North shore...

As a first approximation, we can model the behavior of a shopper (oblivious to the imperfect change issue) as a Markov chain. In other words, given a list of products $A=\{a_i\}_{i=1}^k$, each associated with a price $p_i$, one chooses the $n$-th item $X_n$ with probability $${\mathbb P}(X_n=a\mid X_{n-1}\cdots X_{n-k})$$ which depends on his/her previous $k$ choices. On can then extend this Markov chain into remembering the remainder (modulo 5¢) $R_n$ of the total price after $n$ transactions. In this extended model, the probability of choosing a $n$-th item $X_n$ and ending with a remainder $R_n$, only depends on the previously chosen $k$-products, and the previous remainder $R_{n-1}$. With this in mind, the expected gain $G_n$ of the supermarket can now be written as $$ \mathbb{E}(G_n\mid A) = \sum_{a\in A}\left|\begin{array}{ll}+2\,{\mathbb P}((R_n,X_n)=(3,a)\mid R_{0}=0)\\+{\mathbb P}((R_n,X_n)=(4,a)\mid R_{0}=0)\\-{\mathbb P}((R_n,X_n)=(1,a)\mid R_{0}=0)\\-2\,{\mathbb P}((R_n,X_n)=(2,a)\mid R_{0}=0).\end{array}\right.$$ For small values of $n$, this expectation depends a lot on the list $A$ and its associated prices $p_i$. For instance if the customer buys a single item ($n=1$), the seller could set the remainder of all its prices to $3$ and make an easy additional 2¢ on each transaction.

However, for large values of $n$ (+assuming the ergodicity of the Markov chain), the stationary distribution can be assumed, i.e. the expected gain $\mathbb{E}(G_\infty\mid A)$ no longer depends on the initial state or, equivalently, on $n$. It is then possible to show that, no matter what the consummer preferences (i.e. the probabilities in the Markov chain), or the product list/prices $A$, one has $$\mathbb{E}(G_\infty\mid A)=0.$$ This can be proven thanks to a very elegant symmetry argument by James Martin on MathOverflow. In other words, there is simply no way for the supermarket to rig the game even if the consumer does not actively avoid being short-changed (but assuming that the consumer buys enough products, so the supermarket may and will still win in the end!).


What Feynman saw in a flower...

Richard Feynman - Ode To A Flower from Fraser Davidson on Vimeo.

Because understanding the process does not necessarily make the result any less appealing! (otherwise, would there be any hope left for lasting companionship? ;) )
Beautiful work by Fraser Davidson.


If you haven't seen one of these...


... then you obviously never TA'ed Bioinformatics ;) (Lucky bastard!). Full credits/kudos to XKCD.

On another note (and yes, this will be my last xkcd strip, otherwise my page would simply end up being a mirror):

Full credits again to XKCD.


Best Journal ever!!!

Ever complained about the academic daily routine having become a quest for the best journal that would nevertheless accept one's half-cooked manuscript? Ever felt that citations and impact factors were way too overrated, and that selectivity should prevail? Look no further, as it is my pleasure to share my recent discovery of (arguably) the best journal ever!

With an acceptance rate way lower than 1%, the journal of universal rejection may rightfully pride itself of enforcing the strictest of academic standards (Nature beware!). Created in 2009, the journal solicits submissions in poetry, prose, visual art, and research. Despite such a wide scope, its widely competent editorial board can be trusted to lay an expert eye on your submission, and helpfully motivate its final decision.

Submit Now!


P vs NP: The irrefutable proof ;)

Today is the day I almost died laughing while reading Scott Aaronson's Shtetl-Optimized blog dedicated to Complexity Theory. Ok, I know, this sort of humor alone may be regarded as a conclusive diagnostic by future generations of psychopathologists, but I'd still like to share his beautiful, human-centric, argument for why P != NP. Basically, he is asked the question:

Can you summarize to a (curious/smart) dummy the P vs NP question and its many implications?

He starts by stating the question in four informal ways, one of which is:

Is it harder to solve a math problem yourself than to check a solution by someone else?

and adds

This is where you insert a comment about the delicious irony, that P vs. NP itself is a perfect example of a monstrously-hard problem for which we could nevertheless recognize a solution if we saw one---and hence, part of the explanation for why it's so hard to prove P!=NP is that P!=NP...

I woke up the following morning at the hospital, and must be clinically monitored during any future access to Scott's blog, but I encourage you to check it out for a daily dose of CS wittiness.


Amicable numbers

Amicable numbers are members of the number theory zoology (See their Wikipedia page), which, like trousers and all sorts of useful stuff, come handy in pairs. Formally, one starts by defining the restricted divisor function s(n) to be the sum of all divisors of n, itself excepted.
For instance for n = 220, one finds

s(220) = 1 + 2 + 4 + 5 + 10 + 11 + 20 + 22 + 44 + 55 + 110 = 284.

Then, amicable numbers are pairs of natural numbers (p,q) such that s(p) = q and s(q) = p.
To make this notion explicit, we get back to the example above and find

s(284) = 1 + 2 + 4 + 71 + 142 = 220

and therefore (220,284) are amicable numbers.

Now there is something about this definition that may sound arbitrarily limited to a computer scientist. Indeed, consider the (infinite) directed graph whose vertices are natural numbers, and whose edges are pairs (n,s(n)). Here is a picture, rendered using the allmighty GraphViz (neato mode), showing the resulting network/graph for numbers below 50 (blue/red arcs indicate decreasing/increasing values of s, number 1 omitted for readability, golden nodes are perfect numbers):

And here is the larger graph of numbers below 300 (Full-screen PDF version):

One then easily sees that amicable numbers are in bijection with cycles of length 2 in the graph. Indeed, zooming in on the above graph, one may easily spot the first amicable pair:

This raises the question

Why not consider more general iterations of the s(.) function?

Indeed, one can alternatively, yet equivalently, define amicable numbers as pairs

(n,s(n)) such that s(s(n)) = n and s(n) ≠ n

which naturally generalizes into what is called sociable numbers, i.e. k-tuples of numbers

(n, s(n), s(s(n)), … , sk-1(n)) such that sk(n) = n and sj(n) ≠ n, ∀j∈[1,k-1].

These sequences are also called periodic aliquot sequences. Here is one of the biggest periodic sequence known to date (k=28):

These numbers are currently of some interest in number theory:
  For k=1, sociable numbers are perfect numbers.
  For k=2, sociable numbers are amicable numbers.
  In general, what are aliquot sequence of cycle length k? Do they even exist for large values of k?
Whether there exists a cycle-free infinite aliquot sequence is currently open!