Propositions de stage

Propositions de stage 2016-17

We have the following stage(s) available:

Bayesian update as fixed-point transformation

Some success stories

Bayesian update as a fixed-point transformation

Supervisors

Kostas Chatzikokolakis (CNRS, Ecole Polytechnique) and Catuscia Palamidessi (INRIA, Ecole Polytechnique)

Description

The most distinguishing feature of the Bayesian probability theory consists in a mathematical framework for performing probabilistic inference. It is most often used to judge the relative validity of hypotheses in the face of noisy, sparse, or uncertain data, or to adjust the parameters of a specific model. To evaluate the probability of the hypothesis, or the parameter, the Bayesian method needs to assume a prior probability distribution, which is then updated to a posterior probability in the light of new, relevant data. The update of the probability is calculated using the Bayes' theorem.

The foundations of Bayesian probability theory were laid down by Thomas Bayes (1702–1761), who formulated a special case of what is now called Bayes' theorem, and by Pierre-Simon Laplace (1749–1827) who introduced the general version. Although it had been almost abandoned in the last century in favor of a collection of techniques known as ”frequentist statistics”, in the 1980s there was a sudden comeback of interest for the Bayesian method (the so-called “Bayesian revolution”), mostly attributed to the discovery of Markov chain Monte Carlo methods, which allows to effectuate the update efficiently. Since then, there has bee a dramatic growth in research and applications of Bayesian probability, and it is now commonly employed in many scientific disciplines, from astrophysics to neuroscience. In particular, in Computer Science, it is used in Machine Learning, in Probabilistic Programming, in Quantitative Information Flow, and in Privacy, just to mention a few.

The Bayes’ theorem is formulated as follows:

P(H\mid E)={\frac {P(E\mid H)\cdot P(H)}{P(E)}}

where:

$\textstyle P(H)$ $\textstyle P(H)$ , the prior probability, is the estimate of the probability of the hypothesis $\textstyle H$ $\textstyle H$ before the data $\textstyle E$ $\textstyle E$ , the current evidence, is observed.
$\textstyle P(H\mid E)$ $\textstyle P(H\mid E)$ , the posterior probability, is the probability of $\textstyle H$ $\textstyle H$ given $\textstyle E$ $\textstyle E$ , i.e., after $\textstyle E$ $\textstyle E$ is observed.
$\textstyle P(E\mid H)$ $\textstyle P(E\mid H)$ is the probability of observing $\textstyle E$ $\textstyle E$ given $\textstyle H$ $\textstyle H$ . As a function of $\textstyle E$ $\textstyle E$ with $\textstyle H$ $\textstyle H$ fixed, this is the likelihood – it indicates the compatibility of the evidence with the given hypothesis.
$\textstyle P(E)$ $\textstyle P(E)$ is the marginal likelihood, i.e., the cumulative probability of $\textstyle E$ $\textstyle E$ for all possible hypotheses.

The main criticism to Bayesian inference is about the choice of a prior distribution. In some cases there may be some prior knowledge coming from concrete facts, but in many cases, prior knowledge is either vague, or non-existent. Furthermore, it is usually subjective: different people, having differnt opinion, may suggest different priors. Fortunately, the Bernstein-von Mises theorem ensures that, under certain conditions, and if there are sufficient data, likelihood will dominate over the prior. More precisely, the theorem states that, under certain conditions on the prior and on the domain of the distributions, if we apply the Bayesian update a large number of times with independent trials, the posterior will converge to a Gaussian distribution, progressively more concentrated around the value of the true hipothesis, and eventually to the point distribution on the true hypotesis. Furthermore, and most important, this convergence is independent from whathever prior we start from.

Goals of the stage

The asymptotic behavior of the Bayesian update has been investigated by Joseph L. Doob [1948], and by David A. Freedman [1963,1965], who provided general conditions for the convergence in the case of Bayes estimators. However, much remains to be done. In particular, the study of convergence conditions specific to various applications, and the study of the convergence speed: how to measure it (according to various criteria dictated by the application), and what are the parameters that influence it, etc. In particular, we aim at studying three main application domains:

Differential Privacy
Bayesian Machine Learning
Quantitative Information Flow (aka Language-Based Security)

(Each stage will probably focus on only one of these domains)

Methodology

We will formalize the Bayesian update as a probabilistic transformation on probability distributions, and we will study the convergence condition and the convergence speed of this transformation to a fixed point, which will represent the value of the true hypothesis. We will use the know-how from the theory of (deterministic) programming languages, where such operators are used to provide semantics to recursive programs. A typical condition for the existence of a fixed point is the contraction mapping principle: Let (X, d) be a metric space. Then a map T : X → X is called a contraction mapping on X if there exists q ∈ [0, 1) such that

d(T(x),T(y))\leq k d(x,y)

Banach fixed-point theorem ensures that, if X is complete, then such T admits a unique fixed-point x* in X (i.e. T(x*) = x*). Furthermore, x* can be found as follows: start with an arbitrary element x₀ in X and define a sequence {x_n} by x_n = T(x_n−1), then x_n → x*.

We will first study metrics on distributions suitable to our applications (they will depend on the criteria according to which we measure the quality of a probability extimation, which may vary depending on the application), and then adapt the Banach fixed-point theorem to the case of probabilistic transformations on the metric space of interest. Then we will study how the various characteristics of the application under consideration (for instance, the epsilon parameter representing the level of privacy in differential privacy, the generality measure in machine learning, or the amount of leakage in quantitative information flow) determine whether the Bayesian update is a contraction or not, and how they influence the contraction factor q.

Bibliography

[1] Subhashis Ghosal. A review of Consistency and Convergence of Posterior Distributions [pdf]

[2] M.S. Alvim, K. Chatzikokolakis, C. Palamidessi, and G. Smith. Measuring Information Leakage using Generalized Gain Functions. Proc. of CSF, 265-279, 2012. [pdf]

[3] Cynthia Dwork. A Firm Foundation for Private Data Analysis. Communications of the ACM, Vol. 54 No. 1, Pages 86-95, 2011. [pdf]

Salary

Standard INRIA Internship Gratification

Continuation as PhD

The internship is supposed to be used by the candidate to produce his/her master thesis. There will be the possibility of continuing working with the team as a PhD student. A preference will be given to the students who are willing to continue as PhDs.

Candidate’s required level

Master’s student (2nd year).

Proposed Duration

4-6 months

Prerequisites

Attitude for formal and mathematical reasoning
Some knowledge of Probability Theory would be welcome

Contact people

Kostas Chatzikokolakis or Catuscia Palamidessi

Some success stories

Nicolás E. Bordenabe, ex PhD student in Comète under the direction of Catuscia Palamidessi and Kostas Chatzikokolakis, has received the ACM SIGSAC Doctoral Dissertation Award for Outstanding PhD Thesis in Computer and Information Security, 2014. He has also won the Prix de Thèse de l’Ecole Polytechnique for the best PhD thesis in Computer Science, 2014.
Sophia Knight, ex PhD student in Comète under the direction of Catuscia Palamidessi and Frank Valencia, has won the Prix de Thèse de l’Ecole Polytechnique for the best PhD thesis in Computer Science, 2013.
Mario Alvim,, ex PhD student in Comète under the direction of Catuscia Palamidessi, Finalist of the Prix de Thèse Paris-Tech for the best PhD thesis in Computer Science, 2012.
Konstantinos Chatzikokolakis, ex PhD student in Comète under the direction of Catuscia Palamidessi, received the Prix de Thèse Specif / Gilles Kahn for the best PhD thesis in Computer Science.