Laboratoire d'informatique de l'École polytechnique

Maria Rossi PhD defense: «Graph Mining for Influence Maximization in Social Networks»

Speaker: Maria Rossi
Location: Room Gilles Kahn, Alan Turing building
Date: Fri, 17 Nov 2017, 15:00-17:00

Maria Rossi will defend her phd thesis on Friday, November 17 at 15:00, in the Gilles Kahn room of the Alan Turing building.


Modern science of graphs has emerged the last few years as a field of interest and has been bringing significant advances to our knowledge about networks. Until recently the existing data mining algorithms were destined for structured/relational data while many datasets exist that require graph representation such as social networks, networks generated by textual data, 3D protein structures and chemical compounds. It has become therefore of crucial importance to be able to extract in an efficient and effective way meaningful infor- mation from that kind of data and towards this end graph mining and analysis methods have been proven essential. The goal of this thesis is to study problems in the area of graph mining focusing espe- cially on designing new algorithms and tools related to information spreading and specifi- cally on how to locate influential entities in real-world social networks. This task is crucial in many applications such as information diffusion, epidemic control and viral marketing.

In the first part of the thesis, we have studied spreading processes in social networks focusing on finding topological characteristics that rank entities in the network based on their influential capabilities. We have specifically focused on the K-truss decomposition which is an extension of the k-core decomposition of the graph. Both methods partition a graph into subgraphs whose nodes and/or edges have some common characteristics. For the case of the K-truss, the edges belonging to such subgraph are contained to at least K-2 triangles. After extensive experimental analysis in real-world networks, we showed that the nodes that belong to the maximal K-truss subgraph show a better spreading behavior when compared to baseline criteria such as degree and k-core centralities. Such spreaders have the ability to influence a greater part of the network during the first steps of a spreading process but also the total fraction of the influenced nodes at the end of the epidemic is greater. We have also observed that node members of such dense subgraphs are those achieving the optimal spreading in the network.

In the second part of the thesis, we focused on identifying a group of nodes that by acting all together maximize the expected number of influenced nodes at the end of the spreading process, formally called Influence Maximization. The Influence Maximization problem is actually NP-hard though there exist approximation guarantees for efficient algorithms that can solve the problem while obtaining a solution within the 63% of op- timal classes of models. As those guarantees propose a greedy approximation which is computationally expensive especially for large graphs, we proposed the MATI algorithm which succeeds in locating the group of users that maximize the influence while also be- ing scalable. The algorithm takes advantage of the possible paths created in each node’s neighborhood and precalculates each node’s potential influence and achieves to produce competitive results in quality compared to those of baseline algorithms.

In the last part of the thesis, we study the privacy point of view of sharing such metrics that are good influential indicators in a social network. We have focused on designing an algorithm that addresses the problem of computing through an efficient, correct, se- cure, and privacy-preserving algorithm the k-core metric which measures the influence of each node of the network. We have specifically adopted a decentralization approach where the social network is considered as a Peer-to-peer (P2P) system. The algorithm is built based on the constraint that it should not be possible for a node to reconstruct partially or entirely the graph using the information they obtain during its execution. While a distributed algorithm that computes once and for all the nodes’ coreness is already pro- posed, networks that evolve over time are not taken into account. Our main contribution is an incremental algorithm that efficiently solves the core maintenance problem in P2P while limiting the number of messages exchanged and computations. Through extensive experiments we provide a security and privacy analysis of the solution regarding network de-anonymization. We showed how it relates to previously defined attack models and discuss countermeasures.