Centroids and statistical centroids (and other centers: medians, circumcenters, etc.)
In general, let us define a centroid as the minimizer of the average distance of a center object to a given collection of objects.
For example, the Euclidean centroid minimizes the average squared Euclidean distance to a point set (= minimize the set variance wrt to a center),
and is well-known to be the center of mass, the arithmetic mean of the points.
When the collection of objects is a set of distributions and the distance a statistical distance, we get a statistical centroid (a probability distribution).
In statistics, distances can be metric ones (eg, the total variation or Wasserstein distances)
or non-metric thrice differentiable ones (called divergences like the Kullback-Leibler divergence or the Bhattacharrya divergence, and more generally f-divergences).
Computing centroids are essential for center-based clustering a la k-means (or variational k-means).
Here is a collection of centroids that we have studied:
Disclaimer: Papers are copyrighted by their respective owners and provided online as a courtesy to publishers.
Please check the appropriate copyright information and license agreements of documents.
If you do not comply with copyright/license terms, you are not allowed to download any material.
When we define new statistical distances, we investigate how to efficiently compute them (and then can consider computing corresponding centroids)
So far we have considered sets of independent random variables.
Let us now consider dependent random variables like correlated stochastic processes (random walks, etc) estimated from time-series datasets.
Now instead of using the average divergence (or squared metric distance), we may consider minimizing the maximum distance of the representative center
to the input set: This is the definition of the minmax center, 1-center, or circumcenter of the smallest enclosing ball.