Submodular optimization with constraints and fairness: Algorithmic advances and applications
Date
2025-05
Authors
Zhu, Yanhui
Major Professor
Advisor
Aduri, Pavan
Basu, Samik
Quinn, Christopher
Huai, Mengdi
Huang, Xiaoqiu
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Submodular function optimization is a fundamental tool for modeling complex interactions in machine learning and graph mining. In this thesis, we investigate constrained submodular maximization problems, their variants, and $k$-submodular maximization problems with size constraints and fairness considerations. We develop efficient approximation algorithms with provable guarantees, employing evolutionary, deterministic, and randomized approaches.
First, we address the problem of Submodular Maximization under Cost constraints (SMC). We introduce an evolutionary algorithm, evo-SMC, which achieves a $1/2$-approximation with very high probability $1-1/n$ within $O(n^2 K)$ iterations, where $K$ denotes the maximal feasible solution size under cost constraint $\beta$. To enhance efficiency, we refine evo-SMC and propose a stochastic variant, st-evo-SMC, which significantly reduces computational complexity while maintaining the same approximation ratio of $1/2$ with probability $1-\varepsilon$. The number of required iterations is reduced to $O(nK\log{(1/\varepsilon)}/p)$, where $p \in (0,1]$ is a user-defined stochasticity parameter, and $\varepsilon \in (0,1]$ represents the error threshold.
While SMC focuses on maximizing a submodular function under budget constraints, in some applications, budget limitations may not be a concern. To address this, we examine a variant of SMC that maximizes functions of the form $h = f-c$, where $f$ is a monotone, non-negative, weakly submodular function, and $c$ is a modular function. We propose a deterministic approximation algorithm that runs in $O(\frac{n}{\varepsilon}\log \frac{n}{\gamma \varepsilon})$ oracle calls to $h$ and produces a solution $S'$ satisfying $h(S') \geq \gamma(1-\varepsilon)f(OPT)-c(OPT)-\frac{c(OPT)}{\gamma(1-\varepsilon)}\log\frac{f(OPT)}{c(OPT)}$, where $\gamma$ is the submodularity ratio of $f$. Our algorithm offers an improved approximation guarantee compared to existing methods while achieving a significantly lower computational complexity. Furthermore, we extend our analysis to cases where only an approximate oracle of $f$ is available.
Although submodular functions effectively model a wide range of problems, they may not sufficiently capture certain complex structures. To address this limitation, we further explore a generalized class of functions — k-submodular functions. For the problem of maximizing a k-submodular function, we develop and analyze threshold-greedy algorithms under total size and individual size constraints. We demonstrate that our proposed algorithms achieve the best-known approximation ratios for both constraint settings, offering a tunable balance between computational efficiency and solution quality.
However, size-constrained algorithms could produce unfair solutions where certain groups are underrepresented or overrepresented. To mitigate this issue, we initiate the study of fair $k$-submodular maximization and propose a $1/3$-approximation greedy algorithm with a runtime of $O(knB)$. Our theoretical results match the best-known guarantees for non-fair k-submodular maximization. Furthermore, we introduce a more efficient threshold-based algorithm that achieves a $(1/3 - \varepsilon)$-approximation in $O(\frac{kn}{\varepsilon} \log \frac{B}{\varepsilon})$ function evaluations. For both algorithms, we establish approximation guarantees under settings where the k-submodular function is accessible only via approximate oracles.
Across all studied problems, we extensively validate our theoretical findings through empirical experiments. Our algorithms consistently exhibit substantial improvements in both objective values and computational efficiency. Moreover, our experimental results highlight the practical implications of fairness constraints, demonstrating that they do not significantly degrade solution quality.
Series Number
Journal Issue
Is Version Of
Versions
Series
Academic or Administrative Unit
Type
dissertation