Clustering in Hilbert simplex geometry

by Frank Nielsen and Ke Sun

Clustering categorical distributions in the probability simplex is a fundamental primitive often met in applications dealing with histograms or mixtures of multinomials. Traditionally, the differentialgeometric structure of the probability simplex has been used either by (i) setting the Riemannian metric tensor to the Fisher information matrix of the categorical distributions, or (ii) defining the informationgeometric structure induced by a smooth dissimilarity measure, called a divergence. In this paper, we introduce a novel computationally-friendly non-Riemannian framework for modeling the probability simplex: Hilbert simplex geometry. We discuss the pros and cons of those three statistical modelings, and compare them experimentally for clustering tasks.
Keywords: Fisher-Rao geometry, information geometry, Hilbert simplex geometry, center-based clustering.

Python source code

Download the Python code for reproducible research:
Last updated, April 2017.