The next Comète seminar will take place next Thursday October 31st at 14h in Salle Grace Hopper (LIX - Alan Turing building, École Polytechnique). Grigorios Loukides (Assistant Professor in the Department of Informatics, King's College London) will talk about "String sanitization: a combinatorial approach".
Abstract: String data are often disseminated to support applications such as location-based service provision or DNA sequence analysis. This dissemination, however, may expose sensitive patterns that model confidential knowledge (e.g., trips to mental health clinics from a string representing a user's location history). In this talk, I will examine the problem of sanitizing a string by concealing the occurrences of sensitive patterns, while maintaining data utility. First, I will present a time-optimal algorithm, TFS-ALGO, to construct the shortest string preserving the order of appearance and the frequency of all non-sensitive patterns. Such a string allows accurately performing tasks based on the sequential nature and pattern frequencies of the string. Second, I will present a time-optimal algorithm, PFS-ALGO, which preserves a partial order of appearance of non-sensitive patterns but produces a much shorter string that can be analyzed more efficiently. The strings produced by either of these algorithms may reveal the location of sensitive patterns. In response, I will present a heuristic, MCSR-ALGO, which replaces letters in these strings with carefully selected letters, so that sensitive patterns are not reinstated and occurrences of spurious patterns are prevented. Last, I will present experiments showing that our sanitization approach that applies TFS-ALGO, PFS-ALGO and then MCSR-ALGO is effective and efficient.