Yann Ponty

Chargé de recherche CNRS · Responsable de l'équipe AMIBio
Membre permanent du LIX, Laboratoire d'Informatique de l'École Polytechnique
Bureau 2005 · LIX/Bat. Turing · 1 rue Estienne d'Orves · 91120 Palaiseau · France
+33 1 77 57 80 · No Javascript = no email
Je suis chercheur au CNRS, basé en France à l'École Polytechnique. Je conçois des méthodes Bioinformatiques à forte composante algorithmique, essentiellement afin de mieux comprendre et exploiter le potentiel des Acides RiboNucléiques (ARN).
Certains de mes centres d'intérêt :
  • Prédiction du repliement, design et évolution de l'ARN
  • Interactions ARN/ARN, ARN/Protéines, et Protéines/Protéines
  • Génération aléatoire et combinatoire énumérative
  • Algorithmique Discrète (Programmation dynamique !)
  • Visualisation de l'ARN

Recherche

Sur un plan scientifique, mes principaux centres d'intérêts se situent à l'interface entre Informatique, Mathématiques et Biologie Moléculaire. La finalité de mes travaux est principalement la conception d'approches analytiques, d'algorithmes efficaces et d'outils finalisés en direction de la biologie des Acides RiboNucléiques (ARN).
Quelques unes des questions soulevées sont les suivantes :

  • Comment prédire la structure de l'ARN en présence de pseudonoeuds ?
  • Quelle est la prévalence des phénomènes cinétiques à l'oeuvre au cours du repliement de l'ARN ?
  • Quelle relation entre la structure et l'évolution des ARN ?
    En quoi l'évolution des séquences d'ARN nous renseigne-t-elle sur leur structure ?
  • En quoi la connaissance de la structure d'un ARN est susceptible d'aider à l'analyse des données expérimentales ?
    À l'inverse, comment tirer parti de données experimentales gros grain pour prédire la structure secondaire d'un ARN ?
  • Comment concevoir une séquence d'ARN réalisant une fonction souhaitée in vivo ?

Certaines de ces questions ont trait à des propriétés universelles des biopolymères, et ne requièrent la prise en compte ni d'une séquence précise d'ARN, ni d'un modèle énergétique sophistiqué. En supposant qu'elles puissent être réexprimées à tel un niveau abstrait sans en sacrifier l'essence, j'utilise les outils issus de la combinatoire énumérative et de la combinatoire analytique pour obtenir des réponses quantitatives (asymptotiquement) exactes.

Des questions plus complexes exigent parfois une analyse spécifique de chaque séquence, tout en autorisant la mise en oeuvre d'algorithmes, souvent issus de la programmation dynamique, de complexité temps/mémoire polynomiales. Les concepts et principes sous-jacents à ces algorithmes sont alors parfois d'un niveau de généralité suffisant pour autoriser leur transposition à d'autres domaines. Par exemple, j'exporte en direction de la génomique comparative des méthodes ensemblistes introduites dans le contexte de la thermodynamique, entre autres afin de tester la robustesse des prédictions obtenues selon un principe de parcimonie.

Il arrive cependant que les problèmes considérés soient d'une difficulté algorithmique démontrable au sens de la théorie de la complexité. Dans une tel cas, j'essaye d'établir l'origine de la difficulté du problème afin d'envisager des stratégies de contournement. Par exemple, on pourra envisager la conception d'un algorithme de complexité paramétrée, ou encore une simplification du modèle réalisant un compromis acceptable entre expressivité and calculabilité en temps raisonnable.

Enfin, dans les cas extrêmes où le problème est difficile à appréhender, ou encore, de façon croissante, comme une approche préliminaire pour tester des hypothèses et me familiariser avec un problème, j'adopte une approche probabiliste, basée sur la génération aléatoire contrainte dans des distributions adéquates, telle la distribution uniforme, ou encore la distribution de Boltzmann.


Publications

Ma liste de publications comprend des articles publiés en Bioinformatique, Informatique théorique et Biologie Moléculaire.

Une liste générée à partir de HAL peut être consultée sur cette page.


Logiciels

Bioinformatique des ARN

VARNA

Dessin et annotation des structures secondaires d'ARN. Accepte de nombreuses options documentées, et permet l'exportation du dessin dans de nombreux formats.

Collab.: A. Denise@Paris-Saclay Univ.

VARNA

RNANR

Génération aléatoire non-redondante de structures secondaires. Permet à la fois la production de statistiques plus fiables, et la génération des structures localement optimales, essentielles à l'analyse cinétique.

Collab.: H. Touzet@Univ. Lille

RNANR

SPARCS

SPARCS analyse des régions structurées (ou pas) dans les séquences codantes d'ARN. Il évalue la significativité de la structuration dans un modéle nul qui préserve la séquence protéique, et le contenu en dinucléotides.

Collab.: J. Waldispühl@Univ. McGill

SPARCS

Design d'ARN

IncaRNAtion

Design de séquences d'ARN se repliant en une structure secondaire unique, sous contrainte de composition en nucléotides, grâce à une technique originale de génération aléatoire.

Collab.: J. Waldispühl@Univ. McGill

IncaRNAtion

RNARedPrint

Design d'ARN adoptant des conformations multiples (design positif). Utilise une algorithmique paramétrée par programmation dynamique pour la génération sequences richement contraintes (énergie, composition...).

Collab.: S. Will@TBI Vienna

RNARedPrint

IncaRNAfbInv

Design de séquence d'ARN basé sur une décomposition en fragments, de taille variables. Combine une génération aléatoire contrainte avec une optimisation par recherche locale flexible.

Collab.: J. Waldispühl@Univ. McGill · D. Barash@Ben Gurion Univ.

IncaRNAfbInv

Analyse de séquences

GenRGenS

Bibliothèque pour la génération aléatoire de séquences génomiques structurées. Accepte plusieurs classes de modèles: grammaires pondérées, modèles de Markov (dont HMM), ProSITE patterns, expressions régulières ...

Collab.: A. Denise@Paris-Saclay Univ.

GenRGenS

RNAPyro

Correction d'erreurs dans les données de séquençage d'ARN utilisant la structure secondaire d'ARN.

Collab.: J. Waldispühl@Univ. McGill

RNAPyro

Bioinformatique Structurale

DIAL

Comparaison et de recherche de motifs dans des modèles 3D d'ARN. Critère de similarité expressif, tenant compte des séquence et structure secondaire, mais aussi de la géométrie locale (angles dièdres)

Collab.: P. Clote@Boston College

DIAL

LocalMove

LocalMove discrétise la conformation des macromolécules 3D. Il attribue des points d'une grille (hexagonale, cubique...) aux positions du squelette linéaire, basée sur une recherche locale MCMC utilisant des movements locaux.

Collab.: P. Clote@Boston College

LocalMove


Parcours

LIX · Ecole Polytechnique, France

Responsable équipe AMIBio · Chercheur CNRS
Bioinformatique de l'ARN· Génération Aléatoire · Combinatoire analytique appliquée
Janvier 2016 - Actuellement

Maths Dept/UMI PIMS@Simon Fraser University, Canada

Chercheur CNRS
Génomique Comparative · Bioinformatique de l'ARN · Génération Aléatoire
Principales collaborations locales avec Cédric Chauve et Marni Mishna
Septembre 2013 - Septembre 2015

LIX · Ecole Polytechnique, France

Chercheur CNRS
Design d'ARN · Génération Aléatoire · Bioinformatique de l'ARN
Membre de l'équipe AMIBio
Novembre 2009 - Septembre 2013

IRIF · Université Paris Diderot, France

Postdoc ANR
Génération Aléatoire · Combinatoire Énumérative · Combinatoire Analytique
Co-encadré par Dominique Rossin et Michèle Soria (LIP6)
Novembre 2008 - Octobre 2009

Hopital La Pitié-Salpêtrière · Sorbonne université, France

Postdoc Decrypthon
Bioinformatique Structurale · Interactions Protéines-Protéines
Encadré par Alessandra Carbone (lIP6)
Avril 2008 - Novembre 2008

Biology Department · Boston College, USA

Postdoc NSF
Bioinformatique de l'ARN · Bioinformatique Structurale
Encadré par Peter Clote
Octobre 2006 - Avril 2008

LRI · Université Paris-Saclay, France

Doctorat en Informatique
Bioinformatique · Combinatoire énumérative · Analyse d'Algorithmes
Équipe Bioinfo du LRI · Encadré par Alain Denise · Rapporteurs de la thèse : Philippe Flajolet et Eric Rivals · Monitorat à l'IUT d'Orsay
Octobre 2003 - Octobre 2006

School of Computer Science · Université Paris-Saclay, France

Master (Maîtrise d'Informatique · DEA Algorithmique)
Informatique Théorique · Combinatoire Énumérative · Analyse et Conception d'Algorithmes
Octobre 2000 - Juillet 2003

Université Paris-Saclay, France

Licence (DEUG Maths/Info · Licence d'Informatique)
Informatique · Mathématiques · Physique
Octobre 1997 - Juillet 2000

Administration, animation et service

Quelques unes de mes activités extra-scientifiques :
  • Responsable de l'équipe AMIBio@LIX · Depuis 2016
  • Animateur (avec F. Cazals) du groupe de travail MASIM pour le Groupe de Recherche en Bioinformatique Moléculaire (GdR BIM) · Depuis 2014
  • Membre élu (2016-2020) du conseil de laboratoire du LIX
  • Membre du conseil scientifique pour les bases RNACentral et RFAM, maintenues à l'EMBL-EBI · Depuis 2019
  • Membre du comité Gilles Kahn/SIF pour la meilleure thèse française en Informatique · Depuis 2018
  • Membre du conseil scientifique du DIM RFSI · Depuis 2019
Dans le passé, j'ai aussi consacré mes moments perdus à diverses entités :

Activités éditoriales et relectures


Enseignements

Bien que je consacre l'essentiel de mon activité à la recherche scientifique, j'interviens régulièrement au sein de cours de Master et, plus rarement, à un niveau licence.

Université Paris-Saclay · Palaiseau, France

Master AMI2B · M2
Optimisation Combinatoire · Bioinformatique de l'ARN
Depuis 2009

Sorbonne Université · France

Master BIM · M2
Bioinformatique de l'ARN · Bioinformatique Structurale
Depuis 2009

Ecole Polytechnqie · France

Cycle Ingénieur · L2/M1
Algorithmique et Programmation
Depuis 2009

Fun stuffs* and ramblings

*Disclaimer: Acquired taste

Neil Gaiman's best tips for survival in an artistic academic career

Even though there are some difference between art and the academia, it is quite striking how some of Neil Gaiman's advise to young artists mirror what I would tell my students (with at least one notable exception... :) ).

The secret to keeping fruitful collaborations [skip to quote]

You get work however you get work, but people keep working [..] because their work is good, and because they're easy to get along with, and because they deliver the work on time. And you don't even need all three... two out of three is fine!

  • People will tolerate how unpleasant you are if the work is good and you deliver it on time.
  • People will forgive the lateness of your work if it's good and they like you.
  • And you don't have to be as good as everyone else if you're on time and it's always a pleasure to hear from you.

Neil Gaiman, commencement speech at the University of the Arts Class of 2012

Of course, we should aim for the three out of three, but the observation of two being sufficient is spot on in my experience (and represents a parsimonious and flattering explanation to my continued academic employement despite my chronic tardiness :) ).

On impostor syndrom [skip to quote]

The first problem of [..] success is the unshakable conviction that you are getting away with something and that, any moment now, they will discover you.
[..] I was convinced there would be a knock on the door, and a man with a clipboard [..] would be there to tell me it was all over and they'd caught up with me and now I would have to go and get a real job, one that didn't consist of making things up and writing them down, and reading books I wanted to read...
And then I would go away quietly and get the kind of job where I would have to get up early in the morning and wear a tie and not make things up anymore.

Neil Gaiman, commencement speech at the University of the Arts Class of 2012

I guess most academics can relate to that (generally undeservedly), especially under the grind of our recurrent evaluations (institutions, employers, funding bodies, peers...). Although some of us already wear a tie to start with...

On emails [skip to quote]

There was a day when I looked up and realized that I had become someone who profesionnally replied to emails and who wrote as a hobby.

Neil Gaiman, commencement speech at the University of the Arts Class of 2012

An increasing number of my colleagues adopt quite extreme measures when it comes to email management (eg checking their emails only twice a day, or only within certain time slices...). I think it is one of the great challenge faced by humanity in the information era, and something we'll have to teach our children (academic), to find ways to limit the flow of information requiring active sprocessing (haven't found an acceptable solution myself... still looking).

On cheating on your CV (not an endorsement! :) ) [skip to quote]

People get hired because, somehow, they get hired... In my case I did something which these days would be easy to check and would get me into a lot of trouble and when I started out in those pre-internet days seemed like a sensible career strategy. When I was asked by editors who I'd written for, I lied. I listed a handful of magazines that sounded likely and I sounded confident and I got jobs. I then made it a point of honor to have written something for each of the magazines I've listed to get that first job, so that I hadn't actually lied, I'd just been chronologically challenged.

Neil Gaiman, commencement speech at the University of the Arts Class of 2012

This one I do not condone, of course, but working hard to deserve in retrospect whatever undeserved advantage you get at some point is a great attitude to adopt (cf impostor syndrom).

On luck [skip to quote]

Luck is useful. Often you'll discover that the harder you work, and the more wisely that you work, the luckier you will get but there is luck, and it helps.

Neil Gaiman, commencement speech at the University of the Arts Class of 2012

Parting words [skip to quote]

Be wise, because the world needs more wisdom, and if you cannot be wise pretend to be someone who is wise and then just behave like they would.

Neil Gaiman, commencement speech at the University of the Arts Class of 2012

If the shoe fits... (aka Python's duck typing :) ).


(Canadian) pennies for your thoughts

Background: In 2012, Canada decided to phase out the penny for its coinage system. Product prices may still use arbitrary cents (impractical otherwise since prices in Canada do not typically include taxes) but cash transactions are now rounded up/down to the closest multiple of 5 cents, as shown in the image below:

For instance, if two products A and B are worth 67¢ and 1\$21¢ respectively, one will have to pay 1\$80¢ to buy them together, instead of their accumulated price 1\$78¢. In this case, the seller makes an extra 2¢ in the transaction, thanks to the imperfect round-off. Mathematically speaking, it all depends on the remainder $r$ of the total price in a division by 5¢: If $r$ equals $1$ or $2$, than the seller looses $r$ cents, but if $r$ equals $3$ or $4$, then the seller makes an extra $5-r$ cents in the transaction.

With my wife, we were wondering: Can a supermarket manipulate the prices of its products in such a way that the rounded totals translate, on average, into an extra benefit for the company? We were not overly paranoid about it, but it seemed like a good opportunity to exercise our brain while strolling/bouldering (and, very occasionally, surfing) on the amazing beaches of Oahu's famed North shore...

As a first approximation, we can model the behavior of a shopper (oblivious to the imperfect change issue) as a Markov chain. In other words, given a list of products $A=\{a_i\}_{i=1}^k$, each associated with a price $p_i$, one chooses the $n$-th item $X_n$ with probability $${\mathbb P}(X_n=a\mid X_{n-1}\cdots X_{n-k})$$ which depends on his/her previous $k$ choices. On can then extend this Markov chain into remembering the remainder (modulo 5¢) $R_n$ of the total price after $n$ transactions. In this extended model, the probability of choosing a $n$-th item $X_n$ and ending with a remainder $R_n$, only depends on the previously chosen $k$-products, and the previous remainder $R_{n-1}$. With this in mind, the expected gain $G_n$ of the supermarket can now be written as $$ \mathbb{E}(G_n\mid A) = \sum_{a\in A}\left|\begin{array}{ll}+2\,{\mathbb P}((R_n,X_n)=(3,a)\mid R_{0}=0)\\+{\mathbb P}((R_n,X_n)=(4,a)\mid R_{0}=0)\\-{\mathbb P}((R_n,X_n)=(1,a)\mid R_{0}=0)\\-2\,{\mathbb P}((R_n,X_n)=(2,a)\mid R_{0}=0).\end{array}\right.$$ For small values of $n$, this expectation depends a lot on the list $A$ and its associated prices $p_i$. For instance if the customer buys a single item ($n=1$), the seller could set the remainder of all its prices to $3$ and make an easy additional 2¢ on each transaction.

However, for large values of $n$ (+assuming the ergodicity of the Markov chain), the stationary distribution can be assumed, i.e. the expected gain $\mathbb{E}(G_\infty\mid A)$ no longer depends on the initial state or, equivalently, on $n$. It is then possible to show that, no matter what the consummer preferences (i.e. the probabilities in the Markov chain), or the product list/prices $A$, one has $$\mathbb{E}(G_\infty\mid A)=0.$$ This can be proven thanks to a very elegant symmetry argument by James Martin on MathOverflow. In other words, there is simply no way for the supermarket to rig the game even if the consumer does not actively avoid being short-changed (but assuming that the consumer buys enough products, so the supermarket may and will still win in the end!).


What Feynman saw in a flower...

Richard Feynman - Ode To A Flower from Fraser Davidson on Vimeo.

Because understanding the process does not necessarily make the result any less appealing! (otherwise, would there be any hope left for lasting companionship? ;) )
Beautiful work by Fraser Davidson.


If you haven't seen one of these...


... then you obviously never TA'ed Bioinformatics ;) (Lucky bastard!). Full credits/kudos to XKCD.

On another note (and yes, this will be my last xkcd strip, otherwise my page would simply end up being a mirror):

Full credits again to XKCD.


Best Journal ever!!!

Ever complained about the academic daily routine having become a quest for the best journal that would nevertheless accept one's half-cooked manuscript? Ever felt that citations and impact factors were way too overrated, and that selectivity should prevail? Look no further, as it is my pleasure to share my recent discovery of (arguably) the best journal ever!

With an acceptance rate way lower than 1%, the journal of universal rejection may rightfully pride itself of enforcing the strictest of academic standards (Nature beware!). Created in 2009, the journal solicits submissions in poetry, prose, visual art, and research. Despite such a wide scope, its widely competent editorial board can be trusted to lay an expert eye on your submission, and helpfully motivate its final decision.

Submit Now!


P vs NP: The irrefutable proof ;)

Today is the day I almost died laughing while reading Scott Aaronson's Shtetl-Optimized blog dedicated to Complexity Theory. Ok, I know, this sort of humor alone may be regarded as a conclusive diagnostic by future generations of psychopathologists, but I'd still like to share his beautiful, human-centric, argument for why P != NP. Basically, he is asked the question:

Can you summarize to a (curious/smart) dummy the P vs NP question and its many implications?

He starts by stating the question in four informal ways, one of which is:

Is it harder to solve a math problem yourself than to check a solution by someone else?

and adds

This is where you insert a comment about the delicious irony, that P vs. NP itself is a perfect example of a monstrously-hard problem for which we could nevertheless recognize a solution if we saw one---and hence, part of the explanation for why it's so hard to prove P!=NP is that P!=NP...

I woke up the following morning at the hospital, and must be clinically monitored during any future access to Scott's blog, but I encourage you to check it out for a daily dose of CS wittiness.


Amicable numbers

Amicable numbers are members of the number theory zoology (See their Wikipedia page), which, like trousers and all sorts of useful stuff, come handy in pairs. Formally, one starts by defining the restricted divisor function s(n) to be the sum of all divisors of n, itself excepted.
For instance for n = 220, one finds

s(220) = 1 + 2 + 4 + 5 + 10 + 11 + 20 + 22 + 44 + 55 + 110 = 284.

Then, amicable numbers are pairs of natural numbers (p,q) such that s(p) = q and s(q) = p.
To make this notion explicit, we get back to the example above and find

s(284) = 1 + 2 + 4 + 71 + 142 = 220

and therefore (220,284) are amicable numbers.

Now there is something about this definition that may sound arbitrarily limited to a computer scientist. Indeed, consider the (infinite) directed graph whose vertices are natural numbers, and whose edges are pairs (n,s(n)). Here is a picture, rendered using the allmighty GraphViz (neato mode), showing the resulting network/graph for numbers below 50 (blue/red arcs indicate decreasing/increasing values of s, number 1 omitted for readability, golden nodes are perfect numbers):

And here is the larger graph of numbers below 300 (Full-screen PDF version):

One then easily sees that amicable numbers are in bijection with cycles of length 2 in the graph. Indeed, zooming in on the above graph, one may easily spot the first amicable pair:

This raises the question

Why not consider more general iterations of the s(.) function?

Indeed, one can alternatively, yet equivalently, define amicable numbers as pairs

(n,s(n)) such that s(s(n)) = n and s(n) ≠ n

which naturally generalizes into what is called sociable numbers, i.e. k-tuples of numbers

(n, s(n), s(s(n)), … , sk-1(n)) such that sk(n) = n and sj(n) ≠ n, ∀j∈[1,k-1].

These sequences are also called periodic aliquot sequences. Here is one of the biggest periodic sequence known to date (k=28):

These numbers are currently of some interest in number theory:
  For k=1, sociable numbers are perfect numbers.
  For k=2, sociable numbers are amicable numbers.
  In general, what are aliquot sequence of cycle length k? Do they even exist for large values of k?
Whether there exists a cycle-free infinite aliquot sequence is currently open!