Laboratoire d'informatique de l'École polytechnique

PhD Defense of Tien-Duc CAO: «Toward Automatic Fact-Checking of Statistic Claims»

Speaker: Tien-Duc Cao
Location: Room Gilles Kahn, LIX
Date: Thu, 26 Sep 2019, 14:30-16:30

Tein-Duc Cao, member of the Cedar team, will defend his PhD thesis, entitled: "Toward Automatic Fact-Checking of Statistic Claims".

The defense will take place on Thursday, September 26 at 2:30pm, in the conference room Gilles Kahn at INRIA Saclay (1 Rue Honoré d'Estienne d'Orves, 91120 Palaiseau).


Data journalism and journalistic fact-checking are areas of growing interest within the journalism community and also in the audience at large, given the recent interest in misinformation, manipulation through the media, and journalistic efforts to prevent and debunk such attempts.

This thesis has been developed within a collaboration between several research laboratories and Les Décodeurs, the fact-checking team of the Le Monde newspaper.

The thesis proposed an end-to-end approach toward the automated fact-checking of statistic claims on a topic covered by a reference (trusted) database. Specifically, we have first devised an approach for extracting Linked Open Data from the Web publications of INSEE, the leading french statistic institute. Second, we developed an original search algorithm which, given a set of keywords such as "unemployment rate France 2018", is capable of returning the datasets (and, if possible, the exact values within the datasets) deemed most relevant to the user keywords. Third, we have developed an approach for automatically identifying, in a text written in French, mentions of statistic entities, together with the values associated by the text to these entities, and other context terms (e.g., time or place) attached to the statistic claim. Together, these enable a semi-automated statistic claim verification pipeline, whereas claims are extracted automatically from text and a query is sent to our data retrival algorithm, which returns the reference information closest to the given query. A human user, e.g., a journalist, can then compare the data to the claimed value in order to interpret it in a fact-checking work.

This thesis has been carried on within the ANR ContentCheck project focused on models, algorithms and tools for data journalism and journalistic fact-checking (