Title: Max-sum diversity via convex programming and local search
|Affiliation:||Ecole Polytechnique Federale De Lausanna EPFL|
Diversity maximization is an important concept in information retrieval, computational geometry and operations research. Usually, it is a variant of the following problem: Given a ground set, constraints, and a function f(⋅) that measures diversity of a subset, the task is to select a feasible subset S such that f(S) is maximized. The sum-dispersion function f(S) is the sum of the pairwise distances in S, is in this context a prominent diversification measure. The corresponding diversity maximization is the max-sum or sum-sum diversification. Already for one cardinality constraint, max-sum diversification cannot be approximated better than by a factor of 1/2 if the distances are metric. We present a PTAS for the max-sum diversification problem for distances d(⋅,⋅) of negative type for various constraint settings, including matroid constraints and matroid intersection constraints. Distances of negative type are, for example, metric distances stemming from the ℓ2 and ℓ1 norm, as well as the cosine or spherical, or Jaccard distance which are popular similarity metrics in web and image search.
Joint work with Alfonso Cevallos and Rico Zenklusen