Laboratoire d'informatique de l'École polytechnique

«Streaming k-mismatch and applications», talk by Tatiana Starikovskaia

Speaker: Tatiana Starikovskaia
Location: Room Philippe Flajolet
Date: Thu, 6 Oct 2016, 14:30-15:30

This Thursday, October 6th, 14:30 at LIX (Philippe FLAJOLET room, 2nd floor), AMIB will welcome Tatiana STARIKOVSKAIA (IRIF, Paris VII) to the joint LIX/LRI bioinfo seminar. Tatiana will describe some of her recent contributions in string combinatorics and pattern matching algorithms, with exciting new extensions to streaming (aka “pattern matching for Big Data”).

Abstract: In the talk we will revisit the k-mismatch problem, one of the most basic problems in pattern matching, and its application, the weighted pattern matching problem. In the k-mismatch problem we are given a pattern of length m and a text of length n, and the task is to find all m-length substrings of the text that are at the Hamming distance at most k from the pattern. In the first half of the talk we will discuss a new streaming algorithm by Clifford et al. in SODA’16 and a variant of this algorithm enhanced with an important new feature called Data Recovery.

In the second half of the talk we will discuss an application of the k-mismatch with Data Recovery, the first streaming algorithms for pattern matching on weighted strings. Unlike regular strings, a weighted string is not a sequence of symbols but rather a sequence of probability distributions on the alphabet. In the weighted pattern matching problem we are given a text and a pattern, both of which can be either weighted or regular strings, and must find all alignments of the text and of the pattern where they match with probability above a given threshold.

Links to the papers:

  1. https://arxiv.org/abs/1508.00731 (published in SODA’16)
  2. http://arxiv.org/abs/1607.05626