# «Streaming k-mismatch and applications», talk by Tatiana Starikovskaia

**Speaker:**Tatiana Starikovskaia

**Location:**Room Philippe Flajolet

**Date:**Thu, 6 Oct 2016, 14:30-15:30

This Thursday, October 6th, 14:30 at LIX (Philippe FLAJOLET room, 2nd floor), AMIB will welcome Tatiana STARIKOVSKAIA (IRIF, Paris VII) to the joint LIX/LRI bioinfo seminar. Tatiana will describe some of her recent contributions in string combinatorics and pattern matching algorithms, with exciting new extensions to streaming (aka "pattern matching for Big Data").

**Abstract:** In the talk we will revisit the k-mismatch problem, one of the most basic problems in pattern matching, and its application, the weighted pattern matching problem. In the k-mismatch problem we are given a pattern of length m and a text of length n, and the task is to find all m-length substrings of the text that are at the Hamming distance at most k from the pattern. In the first half of the talk we will discuss a new streaming algorithm by Clifford et al. in SODA'16 and a variant of this algorithm enhanced with an important new feature called Data Recovery.

In the second half of the talk we will discuss an application of the k-mismatch with Data Recovery, the first streaming algorithms for pattern matching on weighted strings. Unlike regular strings, a weighted string is not a sequence of symbols but rather a sequence of probability distributions on the alphabet. In the weighted pattern matching problem we are given a text and a pattern, both of which can be either weighted or regular strings, and must find all alignments of the text and of the pattern where they match with probability above a given threshold.

Links to the papers:

- https://arxiv.org/abs/1508.00731 (published in SODA'16)
- http://arxiv.org/abs/1607.05626