{"title": "Subset Selection under Noise", "book": "Advances in Neural Information Processing Systems", "page_first": 3560, "page_last": 3570, "abstract": "The problem of selecting the best $k$-element subset from a universe is involved in many applications. While previous studies assumed a noise-free environment or a noisy monotone submodular objective function, this paper considers a more realistic and general situation where the evaluation of a subset is a noisy monotone function (not necessarily submodular), with both multiplicative and additive noises. To understand the impact of the noise, we firstly show the approximation ratio of the greedy algorithm and POSS, two powerful algorithms for noise-free subset selection, in the noisy environments. We then propose to incorporate a noise-aware strategy into POSS, resulting in the new PONSS algorithm. We prove that PONSS can achieve a better approximation ratio under some assumption such as i.i.d. noise distribution. The empirical results on influence maximization and sparse regression problems show the superior performance of PONSS.", "full_text": "Subset Selection under Noise\n\nChao Qian1\n\nJing-Cheng Shi2\n\nYang Yu2\n\nKe Tang3,1\n\nZhi-Hua Zhou2\n\n1Anhui Province Key Lab of Big Data Analysis and Application, USTC, China\n2National Key Lab for Novel Software Technology, Nanjing University, China\n\n3Shenzhen Key Lab of Computational Intelligence, SUSTech, China\n\nchaoqian@ustc.edu.cn\n\ntangk3@sustc.edu.cn\n\n{shijc,yuy,zhouzh}@lamda.nju.edu.cn\n\nAbstract\n\nThe problem of selecting the best k-element subset from a universe is involved\nin many applications. While previous studies assumed a noise-free environment\nor a noisy monotone submodular objective function, this paper considers a more\nrealistic and general situation where the evaluation of a subset is a noisy monotone\nfunction (not necessarily submodular), with both multiplicative and additive noises.\nTo understand the impact of the noise, we \ufb01rstly show the approximation ratio of\nthe greedy algorithm and POSS, two powerful algorithms for noise-free subset\nselection, in the noisy environments. We then propose to incorporate a noise-aware\nstrategy into POSS, resulting in the new PONSS algorithm. We prove that PONSS\ncan achieve a better approximation ratio under some assumption such as i.i.d. noise\ndistribution. The empirical results on in\ufb02uence maximization and sparse regression\nproblems show the superior performance of PONSS.\n\n1\n\nIntroduction\n\nSubset selection is to select a subset of size at most k from a total set of n items for optimizing some\nobjective function f, which arises in many applications, such as maximum coverage [10], in\ufb02uence\nmaximization [16], sparse regression [17], ensemble pruning [23], etc. Since it is generally NP-\nhard [7], much effort has been devoted to the design of polynomial-time approximation algorithms.\nThe greedy algorithm is most favored for its simplicity, which iteratively chooses one item with\nthe largest immediate bene\ufb01t. Despite the greedy nature, it can perform well in many cases. For a\nmonotone submodular objective function f, it achieves the (1 \u2212 1/e)-approximation ratio, which is\noptimal in general [18]; for sparse regression where f can be non-submodular, it has the best-so-far\napproximation bound 1 \u2212 e\u2212\u03b3 [6], where \u03b3 is the submodularity ratio.\nRecently, a new approach Pareto Optimization for Subset Selection (POSS) has been shown superior\nto the greedy algorithm [21, 24]. It reformulates subset selection with two simultaneous objectives,\ni.e., optimizing the given objective and minimizing the subset size, and employs a randomized\niterative algorithm to solve this bi-objective problem. POSS is proved to achieve the same general\napproximation guarantee as the greedy algorithm, and is shown better on some subclasses [5]. The\nPareto optimization method has also been successfully applied to solve subset selection with general\ncost constraints [20] as well as ratio optimization of monotone set functions [22].\nMost of the previous studies assumed that the objective function is noise-free. However, we can only\nhave a noisy evaluation in many realistic applications. For examples, for in\ufb02uence maximization,\ncomputing the in\ufb02uence spread objective is #P-hard [2], and thus is often estimated by simulating the\nrandom diffusion process [16], which brings noise; for sparse regression, only a set of limited data can\nbe used for evaluation, which makes the evaluation noisy; and more examples include maximizing\ninformation gain in graphical models [4], crowdsourced image collection summarization [26], etc.\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fTo the best of our knowledge, only a few studies addressing noisy subset selection have been reported,\nwhich assumed monotone submodular objective functions. Under the general multiplicative noise\nmodel (i.e., the noisy objective value F (X) is in the range of (1 \u00b1 \u0001)f (X)), it was proved that no\n\u221a\npolynomial-time algorithm can achieve a constant approximation ratio for any \u0001 > 1/\nn, while\nthe greedy algorithm can achieve a (1 \u2212 1/e \u2212 16\u03b4)-approximation ratio for \u0001 = \u03b4/k as long as\n\u03b4 < 1 [14]. By assuming that F (X) is a random variable (i.e., random noise) and the expectation of\nF (X) is the true value f (X), it was shown that the greedy algorithm can achieve nearly a (1 \u2212 1/e)-\napproximation guarantee via uniform sampling [16] or adaptive sampling [26]. Recently, Hassidim\nand Singer [13] considered the consistent random noise model, where for each subset X, only the\n\ufb01rst evaluation is a random draw from the distribution of F (X) and the other evaluations return the\nsame value. For some classes of noise distribution, they provided polynomial-time algorithms with\nconstant approximations.\nIn this paper, we consider a more general situation, i.e., noisy subset selection with a monotone\nobjective f (not necessarily submodular), for both multiplicative noise and additive noise (i.e., F (X)\nis in the range of f (X) \u00b1 \u0001) models. The main results are:\n\u2022 Firstly, we extend the approximation ratio of the greedy algorithm from the submodular case [14]\nto the general situation (Theorems 1, 2), and also slightly improve it.\n\u2022 Secondly, we prove that the approximation ratio of POSS is nearly the same as that of the greedy\nalgorithm (Theorems 3, 4). Moreover, on two maximum coverage cases, we show that POSS can\nhave a better ability of avoiding the misleading search direction due to the noise (Propositions 1, 2).\n\u2022 Thirdly, we introduce a noise-aware comparison strategy into POSS, and propose the new PONSS\nalgorithm for noisy subset selection. When comparing two solutions with close noisy objective\nvalues, POSS selects the solution with the better observed value, while PONSS keeps both of them\nsuch that the risk of deleting a good solution is reduced. With some assumption such as i.i.d.\n1+\u0001 (1 \u2212 e\u2212\u03b3)-approximation ratio under\nnoise distribution, we prove that PONSS can obtain a 1\u2212\u0001\nmultiplicative noise (Theorem 5). Particularly for the submodular case (i.e., \u03b3 = 1) and \u0001 being a\nconstant, PONSS has a constant approximation ratio. Note that for the greedy algorithm and POSS\nunder general multiplicative noise, they only guarantee a \u0398(1/k) approximation ratio. We also prove\nthe approximation ratio of PONSS under additive noise (Theorem 6).\nWe have conducted experiments on in\ufb02uence maximization and sparse regression problems, two typi-\ncal subset selection applications with the objective function being submodular and non-submodular,\nrespectively. The results on real-world data sets show that POSS is better than the greedy algorithm\nin most cases, and PONSS clearly outperforms POSS and the greedy algorithm.\nWe start the rest of the paper by introducing the noisy subset selection problem. We then present\nin three subsequent sections the theoretical analyses for the greedy, POSS and PONSS algorithms,\nrespectively. We further empirically compare these algorithms. The \ufb01nal section concludes this paper.\n\n2 Noisy Subset Selection\nGiven a \ufb01nite nonempty set V = {v1, . . . , vn}, we study the functions f : 2V \u2192 R de\ufb01ned on\nsubsets of V . The subset selection problem as presented in De\ufb01nition 1 is to select a subset X of V\nsuch that a given objective f is maximized with the constraint |X| \u2264 k, where | \u00b7 | denotes the size of\na set. Note that we only consider maximization since minimizing f is equivalent to maximizing \u2212f.\nDe\ufb01nition 1 (Subset Selection). Given all items V = {v1, . . . , vn}, an objective function f and a\nbudget k, it is to \ufb01nd a subset of at most k items maximizing f, i.e.,\n|X| \u2264 k.\n\narg maxX\u2286V f (X)\n\ns.t.\n\n(1)\n\nis submodular if for any X \u2286 Y , f (Y ) \u2212 f (X) \u2264 (cid:80)\n\nA set function f : 2V \u2192 R is monotone if for any X \u2286 Y , f (X) \u2264 f (Y ). In this paper, we consider\nmonotone functions and assume that they are normalized, i.e., f (\u2205) = 0. A set function f : 2V \u2192 R\nv\u2208Y \\X (f (X \u222a {v}) \u2212 f (X)) [19]. The\nsubmodularity ratio in De\ufb01nition 2 characterizes how close a set function f is to submodularity. It is\neasy to see that f is submodular iff \u03b3X,k(f ) = 1 for any X and k. For some concrete non-submodular\napplications, bounds on \u03b3X,k(f ) were derived [1, 9]. When f is clear, we will use \u03b3X,k shortly.\n\n2\n\n\fAlgorithm 1 Greedy Algorithm\nInput: all items V = {v1, . . . , vn}, a noisy objective function F , and a budget k\nOutput: a subset of V with k items\nProcess:\n1: Let i = 0 and Xi = \u2205.\n2: repeat\n3:\n4:\n5: until i = k\n6: return Xk\n\nLet v\u2217 = arg maxv\u2208V \\Xi F (Xi \u222a {v}).\nLet Xi+1 = Xi \u222a {v\u2217}, and i = i + 1.\n\nDe\ufb01nition 2 (Submodularity Ratio [6]). Let f be a non-negative set function. The submodularity\nratio of f with respect to a set X and a parameter k \u2265 1 is\n\n\u03b3X,k(f ) =\n\nmin\n\nL\u2286X,S:|S|\u2264k,S\u2229L=\u2205\n\n(cid:80)\n\nv\u2208S\n\n(cid:0)f (L \u222a {v}) \u2212 f (L)(cid:1)\n\nf (L \u222a S) \u2212 f (L)\n\n.\n\nIn many applications of subset selection, we cannot obtain the exact objective value f (X), but rather\nonly a noisy one F (X). In this paper, we will study the multiplicative noise model, i.e.,\n\n(1 \u2212 \u0001)f (X) \u2264 F (X) \u2264 (1 + \u0001)f (X),\n\nas well as the additive noise model, i.e.,\n\nf (X) \u2212 \u0001 \u2264 F (X) \u2264 f (X) + \u0001.\n\n(2)\n\n(3)\n\n3 The Greedy Algorithm\n\nThe greedy algorithm as shown in Algorithm 1 iteratively adds one item with the largest F im-\nprovement until k items are selected. It can achieve the best approximation ratio for many subset\nselection problems without noise [6, 18]. However, its performance for noisy subset selection was\nnot theoretically analyzed until recently. Let OP T = maxX:|X|\u2264k f (X) denote the optimal function\nvalue of Eq. (1). Horel and Singer [14] proved that for subset selection with submodular objective\nfunctions under the multiplicative noise model, the greedy algorithm \ufb01nds a subset X with\n\n(cid:32)\n\n1 \u2212\n\n(cid:18) 1 \u2212 \u0001\n\n(cid:19)2k(cid:18)\n\n1 + \u0001\n\n1 \u2212 1\nk\n\n(cid:19)k(cid:33)\n\nf (X) \u2265\n\n1\u2212\u0001\n1+\u0001\n1 + 4k\u0001\n(1\u2212\u0001)2\n\n\u00b7 OP T.\n\n(4)\n\n1+\u0001 according to Eq. (2).\n\nNote that their original bound in Theorem 5 of [14] is w.r.t. F (X) and we have switched to f (X) by\nmultiplying a factor of 1\u2212\u0001\nBy extending their analysis with the submodularity ratio, we prove in Theorem 1 the approximation\nbound of the greedy algorithm for the objective f being not necessarily submodular. Note that their\nanalysis is based on an inductive inequality on F , while we directly use that on f, which brings a\nslight improvement. For the submodular case, \u03b3X,k = 1 and the bound in Theorem 1 changes to be\n\nf (X) \u2265\n\n1\u2212\u0001\n1+\u0001\n1 \u2212 1\u2212\u0001\n\n1+\u0001\n\n1\u2212(cid:16) 1\u2212\u0001\n\n1+\u0001\n\n(cid:17)k(cid:0)1\u2212 1\n(cid:0)1 \u2212 1\n\nk\n\n(cid:1)k\n(cid:1) =\n\n(cid:18) 1\u2212\u0001\nk\u22121(cid:88)\n\n1 \u2212\n\n(cid:1)(cid:32)\n(cid:19)(cid:19)i\u2265 k\u22121(cid:88)\n\n1 + \u0001\n\n(cid:19)k(cid:18)\n\n(cid:18) 1 \u2212 \u0001\n(cid:32)(cid:18)1\u2212\u0001\n(cid:19)2(cid:18)\n\n1\nk\n\n(cid:0)1 \u2212 1\n(cid:18)\n\nk\n\n1\u2212 1\nk\n\n1 \u2212 1\nk\n\n\u00b7 OP T.\n\n(cid:19)k(cid:33)\n\u2265 1\u2212(cid:16) 1\u2212\u0001\n\n(cid:19)(cid:33)i\n\nComparing with that (i.e., Eq. (4)) in [14], our bound is tighter, since\n\nk\n\n1+\u0001\n\n1 \u2212 1\u2212\u0001\nDue to space limitation, the proof of Theorem 1 is provided in the supplementary material. We also\nshow in Theorem 2 the approximation ratio under additive noise. The proof is similar to that of\nTheorem 1, except that Eq. (3) is used instead of Eq. (2) for comparing f (X) with F (X).\n\n1+\u0001\n\n1+\u0001\n\ni=0\n\ni=0\n\n1+\u0001\n1 + 4k\u0001\n(1\u2212\u0001)2\n\n1\u2212 1\nk\n\n\u00b7 k.\n\n(cid:17)2k(cid:0)1\u2212 1\n\nk\n\n(cid:1)k\n\n3\n\n\fAlgorithm 2 POSS Algorithm\nInput: all items V = {v1, . . . , vn}, a noisy objective function F , and a budget k\nParameter: the number T of iterations\nOutput: a subset of V with at most k items\nProcess:\n1: Let x = {0}n, P = {x}, and let t = 0.\n2: while t < T do\n3:\n4:\n5:\n6:\n7:\n8:\n9: end while\n10: return arg maxx\u2208P,|x|\u2264k F (x)\n\nSelect x from P uniformly at random.\nGenerate x(cid:48) by \ufb02ipping each bit of x with probability 1\nn.\nif (cid:64)z \u2208 P such that z (cid:31) x(cid:48) then\n\nP = (P \\ {z \u2208 P | x(cid:48) (cid:23) z}) \u222a {x(cid:48)}.\n\nend if\nt = t + 1.\n\nTheorem 1. For subset selection under multiplicative noise, the greedy algorithm \ufb01nds a subset X\nwith\n\nTheorem 2. For subset selection under additive noise, the greedy algorithm \ufb01nds a subset X with\n\nf (X) \u2265\n\n1\u2212\u0001\n1+\u0001\n1 \u2212 1\u2212\u0001\n\n1+\u0001\n\nf (X) \u2265\n\nk\n\n\u03b3X,k\n\n(cid:0)1 \u2212 \u03b3X,k\n(cid:18)\n1 \u2212(cid:16)\n\nk\n\n(cid:1)(cid:32)\n\n1 \u2212\n\n(cid:18) 1 \u2212 \u0001\n(cid:19)k(cid:16)\n(cid:18)\n(cid:17)k(cid:19)\n\n1 + \u0001\n\n1 \u2212 \u03b3X,k\nk\n\n\u00b7\n\nOP T \u2212 2k\u0001\n\u03b3X,k\n\n(cid:17)k(cid:33)\n(cid:19)\n\n.\n\n1 \u2212 \u03b3X,k\nk\n\n\u00b7 OP T.\n\n4 The POSS Algorithm\nLet a Boolean vector x \u2208 {0, 1}n represent a subset X of V , where xi = 1 if vi \u2208 X and xi = 0\notherwise. The Pareto Optimization method for Subset Selection (POSS) [24] reformulates the\noriginal problem Eq. (1) as a bi-objective maximization problem:\n\n(cid:26)\u2212\u221e,\n\n|x| \u2265 2k\n\nF (x), otherwise ,\n\nf2(x) = \u2212|x|.\n\narg maxx\u2208{0,1}n (f1(x), f2(x)), where f1(x) =\n\nThat is, POSS maximizes the original objective and minimizes the subset size simultaneously. Note\nthat setting f1 to \u2212\u221e is to exclude overly infeasible solutions. We will not distinguish x \u2208 {0, 1}n\nand its corresponding subset for convenience.\nIn the bi-objective setting, the domination relationship as presented in De\ufb01nition 3 is used to compare\ntwo solutions. For |x| < 2k and |y| \u2265 2k, it trivially holds that x (cid:23) y. For |x|,|y| < 2k, x (cid:23) y if\nF (x) \u2265 F (y) \u2227 |x| \u2264 |y|; x (cid:31) y if x (cid:23) y and F (x) > F (y) \u2228 |x| < |y|.\nDe\ufb01nition 3 (Domination). For two solutions x and y,\n\u2022 x weakly dominates y (denoted as x (cid:23) y) if f1(x) \u2265 f1(y) \u2227 f2(x) \u2265 f2(y);\n\u2022 x dominates y (denoted as x (cid:31) y) if x (cid:23) y and f1(x) > f1(y) \u2228 f2(x) > f2(y).\nPOSS as described in Algorithm 2 uses a randomized iterative procedure to optimize the bi-objective\nproblem. It starts from the empty set {0}n (line 1). In each iteration, a new solution x(cid:48) is generated\nby randomly \ufb02ipping bits of an archived solution x selected from the current P (lines 3-4); if x(cid:48) is\nnot dominated by any previously archived solution (line 5), it will be added into P , and meanwhile\nthose solutions weakly dominated by x(cid:48) will be removed (line 6). After T iterations, the solution\nwith the largest F value satisfying the size constraint in P is selected (line 10).\nIn [21, 24], POSS using E[T ] \u2264 2ek2n was proved to achieve the same approximation ratio\nas the greedy algorithm for subset selection without noise, where E[T ] denotes the expected\nnumber of iterations. However, its approximation performance under noise is not known. Let\n\u03b3min = minX:|X|=k\u22121 \u03b3X,k. We \ufb01rst show in Theorem 3 the approximation ratio of POSS under\nmultiplicative noise. The proof is provided in the supplementary material due to space limitation.\nThe approximation ratio of POSS under additive noise is shown in Theorem 4, the proof of which is\nsimilar to that of Theorem 3 except that Eq. (3) is used instead of Eq. (2).\n\n4\n\n\f(a) [13]\n\n(b)\n\nFigure 1: Two examples of the maximum coverage problem.\n\nTheorem 3. For subset selection under multiplicative noise, POSS using E[T ] \u2264 2ek2n \ufb01nds a\nsubset X with |X| \u2264 k and\n\nTheorem 4. For subset selection under additive noise, POSS using E[T ] \u2264 2ek2n \ufb01nds a subset X\nwith |X| \u2264 k and\n\nk\n\n\u03b3min\n\n(cid:1)(cid:32)\n(cid:0)1 \u2212 \u03b3min\n(cid:17)k(cid:19)\n\nk\n\n1 \u2212 \u03b3min\nk\n\n1\u2212\u0001\n1+\u0001\n1 \u2212 1\u2212\u0001\n\n1+\u0001\n\n(cid:18)\n\n1 \u2212(cid:16)\n\n1 \u2212\n\n(cid:18)\n\nf (X) \u2265\n\nf (X) \u2265\n\n(cid:18) 1 \u2212 \u0001\n\n(cid:19)k(cid:16)\n\n1 + \u0001\n\n(cid:17)k(cid:33)\n\n1 \u2212 \u03b3min\nk\n\n(cid:19)\n\n\u2212(cid:16)\n\n\u00b7 OP T.\n\n(cid:17)k\n\n\u0001.\n\n\u00b7\n\nOP T \u2212 2k\u0001\n\u03b3min\n\n1 \u2212 \u03b3min\nk\n\nBy comparing Theorem 1 with 3, we \ufb01nd that the approximation bounds of POSS and the greedy\nalgorithm under multiplicative noise are nearly the same. Particularly, for the submodular case (where\n\u03b3X,k = 1 for any X and k), they are exactly the same. Under additive noise, their approximation\nbounds (i.e., Theorems 2 and 4) are also nearly the same, since the additional term (1 \u2212 \u03b3min\nk )k\u0001 in\nTheorem 4 can almost be omitted compared with other terms.\nTo further investigate the performances of the greedy algorithm and POSS, we compare them on two\nmaximum coverage examples with noise. Maximum coverage as in De\ufb01nition 4 is a classic subset\nselection problem. Given a family of sets that cover a universe of elements, the goal is to select at\nmost k sets whose union is maximal. For Examples 1 and 2, the greedy algorithm easily \ufb01nds an\noptimal solution if without noise, but can only guarantee nearly a 2/k and 3/4-approximation under\nnoise, respectively. We prove in Propositions 1 and 2 that POSS can avoid the misleading search\ndirection due to noise through multi-bit search and backward search, respectively, and \ufb01nd an optimal\nsolution. Note that the greedy algorithm can only perform single-bit forward search. Due to space\nlimitation, the proofs are provided in the supplementary material.\nDe\ufb01nition 4 (Maximum Coverage). Given a ground set U, a collection V = {S1, S2, . . . , Sn} of\nsubsets of U, and a budget k, it is to \ufb01nd a subset of V (represented by x \u2208 {0, 1}n) such that\n\narg maxx\u2208{0,1}n f (x) = |(cid:91)\n\nSi|\n\ns.t.\n\n|x| \u2264 k.\n\ni:xi=1\n\nExample 1. [13] As shown in Figure 1(a), V contains n = 2l subsets {S1, . . . , S2l}, where \u2200i \u2264 l,\nSi covers the same two elements, and \u2200i > l, Si covers one unique element. The objective evaluation\nis exact except that \u2200\u2205 \u2282 X \u2286 {S1, . . . , Sl}, i > l, F (X) = 2 + \u03b4 and F (X \u222a {Si}) = 2, where\n0 < \u03b4 < 1. The budget satis\ufb01es that 2 < k \u2264 l.\nProposition 1. For Example 1, POSS using E[T ] = O(kn log n) \ufb01nds an optimal solution, while the\ngreedy algorithm cannot.\nExample 2. As shown in Figure 1(b), V contains n = 4l subsets {S1, . . . , S4l}, where \u2200i \u2264 4l \u2212 3 :\n|Si| = 1, |S4l\u22122| = 2l \u2212 1, and |S4l\u22121| = |S4l| = 2l \u2212 2. The objective evaluation is exact except\nthat F ({S4l}) = 2l. The budget k = 2.\nProposition 2. For Example 2, POSS using E[T ] = O(n) \ufb01nds the optimal solution {S4l\u22122, S4l\u22121},\nwhile the greedy algorithm cannot.\n\n5 The PONSS Algorithm\n\nPOSS compares two solutions based on the domination relation as shown in De\ufb01nition 3. This may\nbe not robust to noise, because a worse solution can appear to have a better F value and then survive\nto replace the true better solution. Inspired by the noise handling strategy threshold selection [25],\nwe modify POSS by replacing domination with \u03b8-domination, where x is better than y if F (x) is\nlarger than F (y) by at least a threshold. By \u03b8-domination, solutions with close F values will be kept\n\n5\n\n\ud835\udc46\ud835\udc59+1\ud835\udc46\ud835\udc59+2\ud835\udc462\ud835\udc59\ud835\udc461\ud835\udc46\ud835\udc56\ud835\udc46\ud835\udc59\ud835\udc46\ud835\udc59\ud835\udc462\ud835\udc59\u22121\ud835\udc461\ud835\udc463\ud835\udc59\u22123\ud835\udc464\ud835\udc59\u22123\ud835\udc462\ud835\udc59\ud835\udc464\ud835\udc59\u22122\ud835\udc464\ud835\udc59\u22121\ud835\udc464\ud835\udc59\fSelect x from P uniformly at random.\nGenerate x(cid:48) by \ufb02ipping each bit of x with probability 1\nn.\nif (cid:64)z \u2208 P such that z (cid:31)\u03b8 x(cid:48) then\n\nAlgorithm 3 PONSS Algorithm\nInput: all items V = {v1, . . . , vn}, a noisy objective function F , and a budget k\nParameter: T , \u03b8 and B\nOutput: a subset of V with at most k items\nProcess:\n1: Let x = {0}n, P = {x}, and let t = 0.\n2: while t < T do\n3:\n4:\n5:\n6:\n7:\n8:\n9:\n10:\n11:\n12:\n13:\n14:\n15:\n16:\n17:\n18: end while\n19: return arg maxx\u2208P,|x|\u2264k F (x)\n\nP = (P \\ {z \u2208 P | x(cid:48) (cid:23)\u03b8 z}) \u222a {x(cid:48)}.\nQ = {z \u2208 P | |z| = |x(cid:48)|}.\nif |Q| = B + 1 then\nP = P \\ Q and let j = 0.\nwhile j < B do\n\nend if\nt = t + 1.\n\nend while\n\nend if\n\nSelect two solutions z1, z2 from Q uniformly at random without replacement.\nEvaluate F (z1), F (z2); let \u02c6z = arg maxz\u2208{z1,z2} F (z) (breaking ties randomly).\nP = P \u222a {\u02c6z}, Q = Q \\ {\u02c6z}, and j = j + 1.\n\nin P rather than only one with the best F value is kept; thus the risk of removing a good solution is\nreduced. This modi\ufb01ed algorithm called PONSS (Pareto Optimization for Noisy Subset Selection) is\npresented in Algorithm 3. However, using \u03b8-domination may also make the size of P very large, and\nthen reduce the ef\ufb01ciency. We further introduce a parameter B to limit the number of solutions in P\nfor each possible subset size. That is, if the number of solutions with the same size in P exceeds B,\none of them will be deleted. As shown in lines 7-15, the better one of two solutions randomly selected\nfrom Q is kept; this process is repeated for B times, and the remaining solution in Q is deleted.\nFor the analysis of PONSS, we consider random noise, i.e., F (x) is a random variable, and assume\nthat the probability of F (x) > F (y) is not less than 0.5 + \u03b4 if f (x) > f (y), i.e.,\n\nPr(F (x) > F (y)) \u2265 0.5 + \u03b4\n\nif\n\nf (x) > f (y),\n\n(5)\nwhere \u03b4 \u2208 [0, 0.5). This assumption is satis\ufb01ed in many noisy settings, e.g., the noise distribution\nis i.i.d. for each x (which is explained in the supplementary material). Note that for comparing\ntwo solutions selected from Q in line 12 of PONSS, we reevaluate their noisy objective F values\nindependently, i.e., each evaluation is a new independent random draw from the noise distribution.\nFor the multiplicative noise model, we use the multiplicative \u03b8-domination relation as presented in\n1\u2212\u03b8 \u00b7 F (y) and |x| \u2264 |y|. The approximation ratio of\nDe\ufb01nition 5. That is, x (cid:23)\u03b8 y if F (x) \u2265 1+\u03b8\nPONSS with the assumption Eq. (5) is shown in Theorem 5, which is better than that of POSS under\n(cid:1)k\ngeneral multiplicative noise (i.e., Theorem 3), because\n\n1 \u2212(cid:16) 1\u2212\u0001\n\n1 \u2212(cid:0)1 \u2212 \u03b3min\n\n(cid:18) 1 \u2212 \u0001\n\n(cid:17)i\n\n(cid:16)\n\n(cid:16)\n\nk\u22121(cid:88)\n\n(cid:17)(cid:19)i \u2264 k\u22121(cid:88)\n\n(cid:17)k(cid:0)1 \u2212 \u03b3min\n(cid:0)1 \u2212 \u03b3min\n\n(cid:1)k\n(cid:1) =\n\nk\n\nk\n\n1 \u2212 \u03b3min\nk\n\n=\n\nk\n\n.\n\n\u03b3min\n\nk\n\n1+\u0001\n\n1 \u2212 1\u2212\u0001\n\n1+\u0001\n\n1\u2212 \u03b3min\nk\n\n1 + \u0001\n\ni=0\n\ni=0\n\nParticularly for the submodular case where \u03b3min = 1, PONSS with the assumption Eq. (5) can achieve\na constant approximation ratio even when \u0001 is a constant, while the greedy algorithm and POSS under\ngeneral multiplicative noise only guarantee a \u0398(1/k) approximation ratio. Note that when \u03b4 is a\nconstant, the approximation guarantee of PONSS can hold with a constant probability by using a\npolynomially large B, and thus the number of iterations of PONSS is polynomial in expectation.\nDe\ufb01nition 5 (Multiplicative \u03b8-Domination). For two solutions x and y,\n\u2022 x weakly dominates y (denoted as x (cid:23)\u03b8 y) if f1(x) \u2265 1+\u03b8\n\u2022 x dominates y (denoted as x (cid:31)\u03b8 y) if x (cid:23)\u03b8 y and f1(x) > 1+\u03b8\n\n1\u2212\u03b8 \u00b7 f1(y) \u2227 f2(x) \u2265 f2(y);\n\n1\u2212\u03b8 \u00b7 f1(y) \u2228 f2(x) > f2(y).\n\n6\n\n\fLemma 1. [21] For any X \u2286 V , there exists one item \u02c6v \u2208 V \\ X such that\n\nf (X \u222a {\u02c6v}) \u2212 f (X) \u2265 \u03b3X,k\nk\n\n(OP T \u2212 f (X)).\n\nTheorem 5. For subset selection under multiplicative noise with the assumption Eq. (5), with\n), PONSS using \u03b8 \u2265 \u0001 and T = 2eBnk2 log 2k \ufb01nds a subset\nprobability at least 1\nX with |X| \u2264 k and\n\n2 (1 \u2212 12nk2 log 2k\n\nB2\u03b4\n\n(cid:18)\n\n1 \u2212(cid:16)\n\n(cid:17)k(cid:19)\n\nf (X) \u2265 1 \u2212 \u0001\n\n1 + \u0001\n\n1 \u2212 \u03b3min\nk\n\n\u00b7 OP T.\n\nProof. Let Jmax denote the maximum value of j \u2208 [0, k] such that in P , there exists a solution x\nwith |x| \u2264 j and f (x) \u2265 (1 \u2212 (1 \u2212 \u03b3min\nk )j) \u00b7 OP T . Note that Jmax = k implies that there exists one\nsolution x\u2217 in P satisfying that |x\u2217| \u2264 k and f (x\u2217) \u2265 (1 \u2212 (1 \u2212 \u03b3min\nk )k) \u00b7 OP T . Since the \ufb01nal\nselected solution x from P has the largest F value (i.e., line 19 of Algorithm 3), we have\n\nf (x) \u2265 1\n1 + \u0001\n\nF (x) \u2265 1\n1 + \u0001\n\nF (x\u2217) \u2265 1 \u2212 \u0001\n\n1 + \u0001\n\nf (x\u2217).\n\nThat is, the desired approximation bound is reached. Thus, we only need to analyze the probability of\nJmax = k after running T = 2eBnk2 log 2k number of iterations.\nAssume that in the run of PONSS, one solution with the best f value in Q is always kept after each\nimplementation of lines 8-15. We then show that Jmax can reach k with probability at least 0.5 after\n2eBnk2 log 2k iterations. Jmax is initially 0 since it starts from {0}n, and we assume that currently\nJmax = i < k. Let x be a corresponding solution with the value i, i.e., |x| \u2264 i and\n\n(cid:18)\n\n1 \u2212(cid:16)\n\n(cid:17)i(cid:19)\n\nf (x) \u2265\n\n1 \u2212 \u03b3min\nk\n\n\u00b7 OP T.\n\n(6)\n\nf (x(cid:48)) \u2265(cid:16)\n\n(cid:17)\n\nFirst, Jmax will not decrease. If x is not deleted, it obviously holds. For deleting x, there are two\npossible cases. If x is deleted in line 6, the newly included solution x(cid:48) (cid:23)\u03b8 x, which implies that\n|x(cid:48)| \u2264 |x| \u2264 i and f (x(cid:48)) \u2265 1\n1\u2212\u0001 F (x) \u2265 f (x), where the\nthird inequality is by \u03b8 \u2265 \u0001. If x is deleted in lines 8-15, there must exist one solution z\u2217 in P with\n|z\u2217| = |x| and f (z\u2217) \u2265 f (x), because we assume that one solution with the best f value in Q is\nkept. Second, Jmax can increase in each iteration with some probability. From Lemma 1, we know\nthat a new solution x(cid:48) can be produced by \ufb02ipping one speci\ufb01c 0 bit of x (i.e., adding a speci\ufb01c item)\nsuch that |x(cid:48)| = |x| + 1 \u2264 i + 1 and\n\n1+\u0001 F (x(cid:48)) \u2265 1\n\n1\u2212\u03b8 F (x) \u2265 1\n\n1+\u0001 \u00b7 1+\u03b8\n\n1+\u0001 \u00b7 1+\u0001\n\n1 \u2212 \u03b3x,k\nk\n\nf (x) +\n\n\u00b7 OP T \u2265\n\n\u03b3x,k\n\nk\n\n1 \u2212 \u03b3min\nk\n\n\u00b7 OP T,\n\nwhere the second inequality is by Eq. (6) and \u03b3x,k \u2265 \u03b3min (since |x| < k and \u03b3x,k decreases with x).\nNote that x(cid:48) will be added into P ; otherwise, there must exist one solution in P dominating x(cid:48) (line 5\nof Algorithm 3), and this implies that Jmax has already been larger than i, which contradicts with the\nassumption Jmax = i. After including x(cid:48), Jmax \u2265 i + 1. Since P contains at most B solutions for\neach possible size {0, . . . , 2k \u2212 1}, |P| \u2264 2Bk. Thus, Jmax can increase by at least 1 in one iteration\nn )n\u22121 \u2265 1\nwith probability at least 1|P| \u00b7 1\n2eBnk , where 1|P| is the probability of selecting x in\nn )n\u22121 is the probability of \ufb02ipping only a\nline 3 of Algorithm 3 due to uniform selection and 1\nspeci\ufb01c bit of x in line 4. We divide the 2eBnk2 log 2k iterations into k phases with equal length.\nFor reaching Jmax = k, it is suf\ufb01cient that Jmax increases at least once in each phase. Thus, we have\n\n1 \u2212 (1 \u2212 1/(2eBnk))2eBnk log 2k(cid:17)k \u2265 (1 \u2212 1/(2k))k \u2265 1/2.\n\nPr(Jmax = k) \u2265(cid:16)\n\nn (1 \u2212 1\n\nn (1 \u2212 1\n\n(cid:18)\n\n1 \u2212(cid:16)\n\n(cid:17)i+1(cid:19)\n\nWe then only need to investigate our assumption that in the run of 2eBnk2 log 2k iterations, when\nimplementing lines 8-15, one solution with the best f value in Q is always kept. Let R = {z\u2217 \u2208\narg maxz\u2208Q f (z)}. If |R| > 1, it trivially holds, since only one solution from Q is deleted. If\n|R| = 1, deleting the solution z\u2217 with the best f value implies that z\u2217 is never included into\nP in implementing lines 11-13 of Algorithm 3, which are repeated for B iterations. In the j-th\n(where 0 \u2264 j \u2264 B \u2212 1) iteration, |Q| = B + 1 \u2212 j. Under the condition that z\u2217 is not included\ninto P from the 0-th to the (j \u2212 1)-th iteration, the probability that z\u2217 is selected in line 11 is\nof line 12 with probability at least 0.5+\u03b4. Thus, the probability of not including z\u2217 into P in the j-th\n\n(cid:1) = 2/(B + 1 \u2212 j). We know from Eq. (5) that F (z\u2217) is better in the comparison\n\n(B \u2212 j)/(cid:0)B+1\u2212j\n\n2\n\n7\n\n\fB+1\u2212j \u00b7 (0.5+\u03b4). Then, the probability of deleting the solution with the best f\nB+1\u2212j ). Taking the logarithm, we get\n\niteration is at most 1\u2212 2\n\nvalue in Q when implementing lines 8-15 is at most(cid:81)B\u22121\n(cid:19)\n(cid:18) j \u2212 2\u03b4\n(cid:90) B+1\nj=0 (1\u2212 1+2\u03b4\n(cid:19)\n(cid:18) (1 \u2212 2\u03b4)1\u22122\u03b4\n\n(cid:18) B \u2212 j \u2212 2\u03b4\nB(cid:88)\n(cid:19)\n(cid:18) (B + 1 \u2212 2\u03b4)B+1\u22122\u03b4\n\nB + 1 \u2212 j\n\nB\u22121(cid:88)\n\n(cid:19)\n\nj + 1\n\n\u2264\n\nlog\n\nlog\n\nj=0\n\n1\n\n=\n\nj=1\n\n(cid:18) j \u2212 2\u03b4\n\n(cid:19)\n\nj + 1\n\ndj\n\nlog\n\n,\n\nj+1 is increasing with j, and the last equality is since the derivative\n\n= log\n\n(B + 2)B+2\nwhere the inequality is since log j\u22122\u03b4\nof log (j\u22122\u03b4)j\u22122\u03b4\n\n(j+1)j+1 with respect to j is log j\u22122\u03b4\n1 \u2212 1+2\u03b4\nB +1\u2212j\n\n(cid:18) B +1\u22122\u03b4\n\n(cid:19)B+2 \u00b7\n\nB\u22121(cid:89)\n\n(cid:18)\n\n(cid:19)\n\nB +2\n\n\u2264\n\nj=0\n\n\u2212 log\n\n22\n\nj+1 . Thus, we have\n\n1\n\n(B +1\u22122\u03b4)1+2\u03b4 \u00b7\n\n4\n\n(1\u22122\u03b4)1\u22122\u03b4 \u2264\n\n4\n\ne1\u22121/eB1+2\u03b4\n\n,\n\nwhere the last inequality is by 0 < 1 \u2212 2\u03b4 \u2264 1 and (1 \u2212 2\u03b4)1\u22122\u03b4 \u2265 e\u22121/e. By the union bound, our\nassumption holds with probability at least 1 \u2212 (12nk2 log 2k)/B2\u03b4. Thus, the theorem holds.\n\nFor the additive noise model, we use the additive \u03b8-domination relation as presented in De\ufb01nition 6.\nThat is, x (cid:23)\u03b8 y if F (x) \u2265 F (y) + 2\u03b8 and |x| \u2264 |y|. By applying Eq. (3) and additive \u03b8-domination\nto the proof procedure of Theorem 5, we can prove the approximation ratio of PONSS under additive\nnoise with the assumption Eq. (5), as shown in Theorem 6. Compared with the approximation ratio\nof POSS under general additive noise (i.e., Theorem 4), PONSS achieves a better one. This can be\neasily veri\ufb01ed since (1 \u2212 (1 \u2212 \u03b3min\n\u2265 2\u0001, where the inequality is derived by \u03b3min \u2208 [0, 1].\nDe\ufb01nition 6 (Additive \u03b8-Domination). For two solutions x and y,\n\u2022 x weakly dominates y (denoted as x (cid:23)\u03b8 y) if f1(x) \u2265 f1(y) + 2\u03b8 \u2227 f2(x) \u2265 f2(y);\n\u2022 x dominates y (denoted as x (cid:31)\u03b8 y) if x (cid:23)\u03b8 y and f1(x) > f1(y) + 2\u03b8 \u2228 f2(x) > f2(y).\nTheorem 6. For subset selection under additive noise with the assumption Eq. (5), with probability\n), PONSS using \u03b8 \u2265 \u0001 and T = 2eBnk2 log 2k \ufb01nds a subset X with\nat least 1\n|X| \u2264 k and\nf (X) \u2265\n\n(cid:18)\n1 \u2212(cid:16)\n\n2 (1 \u2212 12nk2 log 2k\n\n\u00b7 OP T \u2212 2\u0001.\n\n(cid:17)k(cid:19)\n\nk )k) 2k\u0001\n\u03b3min\n\nB2\u03b4\n\n1 \u2212 \u03b3min\nk\n\n6 Empirical Study\n\nWe conducted experiments on two typical subset selection problems: in\ufb02uence maximization and\nsparse regression, where the former has a submodular objective function and the latter has a non-\nsubmodular one. The number T of iterations in POSS is set to 2ek2n as suggested by Theorem 3.\nFor PONSS, B is set to k, and \u03b8 is set to 1, which is obviously not smaller than \u0001. Note that POSS\nneeds one objective evaluation for the newly generated solution x(cid:48) in each iteration, while PONSS\nneeds 1 or 1 + 2B evaluations, which depends on whether the condition in line 8 of Algorithm 3 is\nsatis\ufb01ed. For the fairness of comparison, PONSS is terminated until the total number of evaluations\nreaches that of POSS, i.e., 2ek2n. Note that in the run of each algorithm, only a noisy objective value\nF can be obtained; while for the \ufb01nal output solution, we report its accurately estimated f value for\nthe assessment of the algorithms by an expensive evaluation. As POSS and PONSS are randomized\nalgorithms and the behavior of the greedy algorithm is also randomized under random noise, we\nrepeat the run 10 times independently and report the average estimated f values.\nIn\ufb02uence Maximization The task is to identify a set of in\ufb02uential users in social networks. Let\na directed graph G(V, E) represent a social network, where each node is a user and each edge\n(u, v) \u2208 E has a probability pu,v representing the in\ufb02uence strength from user u to v. Given a budget\nk, in\ufb02uence maximization is to \ufb01nd a subset X of V with |X| \u2264 k such that the expected number of\nnodes activated by propagating from X (called in\ufb02uence spread) is maximized. The fundamental\npropagation model Independent Cascade [11] is used. Note that the set of active nodes in the diffusion\nprocess is a random variable, and the expectation of its size is monotone and submodular [16].\nWe use two real-world data sets: ego-Facebook and Weibo. ego-Facebook is downloaded from http:\n//snap.stanford.edu/data/index.html, and Weibo is crawled from a Chinese microblogging\n\n8\n\n\f(a) ego-Facebook (4,039 #nodes, 88,234 #edges)\n\n(b) Weibo (10,000 #nodes, 162,371 #edges)\n\nFigure 2: In\ufb02uence maximization (in\ufb02uence spread: the larger the better). The right sub\ufb01gure on\neach data set: in\ufb02uence spread vs running time of PONSS and POSS for k = 7.\n\n(a) protein (24,387 #inst, 357 #feat)\n\n(b) YearPredictionMSD (515,345 #inst, 90 #feat)\n\nFigure 3: Sparse regression (R2: the larger the better). The right sub\ufb01gure on each data set: R2 vs\nrunning time of PONSS and POSS for k = 14.\n\nsite Weibo.com like Twitter. On each network, the propagation probability of one edge from node u\nto v is estimated by weight(u,v)\nindegree(v) , as widely used in [3, 12]. We test the budget k from 5 to 10. For\nestimating the objective in\ufb02uence spread, we simulate the diffusion process 10 times independently\nand use the average as an estimation. But for the \ufb01nal output solutions of the algorithms, we average\nover 10,000 times for accurate estimation.\nFrom the left sub\ufb01gure on each data set in Figure 2, we can see that POSS is better than the greedy\nalgorithm, and PONSS performs the best. By selecting the greedy algorithm as the baseline, we plot\nin the right sub\ufb01gures the curve of in\ufb02uence spread over running time for PONSS and POSS with\nk = 7. Note that the x-axis is in kn, the running time order of the greedy algorithm. We can see that\nPONSS quickly reaches a better performance, which implies that PONSS can be ef\ufb01cient in practice.\nSparse Regression The task is to \ufb01nd a sparse approximation solution to the linear regression\nproblem. Given all observation variables V = {v1, . . . , vn}, a predictor variable z and a budget k,\nsparse regression is to \ufb01nd a set of at most k variables maximizing the squared multiple correlation\ni\u2208X \u03b1ivi)2] denotes the mean\nR2\nsquared error. We assume w.l.o.g. that all random variables are normalized to have expectation 0 and\nvariance 1. The objective R2\nWe use\nfrom http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/\nsets\ndatasets/. The budget k is set to {10, 12, . . . , 20}. For estimating R2 in the optimiza-\ntion process, we use a random sample of 1000 instances. But for the \ufb01nal output solutions, we use\nthe whole data set for accurate estimation. The results are plotted in Figure 3. The performances of\nthe three algorithms are similar to that observed for in\ufb02uence maximization, except some losses of\nPOSS over the greedy algorithm (e.g., on YearPredictionMSD with k = 20).\nFor both tasks, we test PONSS with \u03b8 = {0.1, 0.2, . . . , 1}. The results are provided in the supple-\nmentary material due to space limitation, which show that PONSS is always better than POSS and\nthe greedy algorithm. This implies that the performance of PONSS is not sensitive to the value of \u03b8.\n\nz,X = 1\u2212 MSEz,X [8, 15], where MSEz,X = min\u03b1\u2208R|X| E[(z \u2212(cid:80)\n\nz,X is monotone increasing, but not necessarily submodular [6].\n\ntwo data\n\n7 Conclusion\n\nIn this paper, we study the subset selection problem with monotone objective functions under\nmultiplicative and additive noises. We \ufb01rst show that the greedy algorithm and POSS, two powerful\nalgorithms for noise-free subset selection, achieve nearly the same approximation guarantee under\nnoise. Then, we propose a new algorithm PONSS, which can achieve a better approximation ratio with\nsome assumption such as i.i.d. noise distribution. The experimental results on in\ufb02uence maximization\nand sparse regression exhibit the superior performance of PONSS.\n\n9\n\n5678910Budget k100012001400160018002000Influence SpreadPONSSPOSSGreedy0102030Running time in kn60080010001200140016001800Influence SpreadPONSSPOSSGreedy5678910Budget k100200300400500600Influence SpreadPONSSPOSSGreedy0102030Running time in kn100200300400500Influence SpreadPONSSPOSSGreedy101214161820Budget k0.040.060.080.10.120.140.16R2PONSSPOSSGreedy0204060Running time in kn00.050.10.15R2PONSSPOSSGreedy101214161820Budget k00.050.10.150.2R2PONSSPOSSGreedy0204060Running time in kn00.050.10.15R2PONSSPOSSGreedy\fAcknowledgements The authors would like to thank reviewers for their helpful comments and\nsuggestions. C. Qian was supported by NSFC (61603367) and YESS (2016QNRC001). Y. Yu was\nsupported by JiangsuSF (BK20160066, BK20170013). K. Tang was supported by NSFC (61672478)\nand Royal Society Newton Advanced Fellowship (NA150123). Z.-H. Zhou was supported by NSFC\n(61333014) and Collaborative Innovation Center of Novel Software Technology and Industrialization.\n\nReferences\n[1] A. A. Bian, J. M. Buhmann, A. Krause, and S. Tschiatschek. Guarantees for greedy maximization of\n\nnon-submodular functions with applications. In ICML, pages 498\u2013507, 2017.\n\n[2] W. Chen, C. Wang, and Y. Wang. Scalable in\ufb02uence maximization for prevalent viral marketing in\n\nlarge-scale social networks. In KDD, pages 1029\u20131038, 2010.\n\n[3] W. Chen, Y. Wang, and S. Yang. Ef\ufb01cient in\ufb02uence maximization in social networks. In KDD, pages\n\n199\u2013208, 2009.\n\n[4] Y. Chen, H. Hassani, A. Karbasi, and A. Krause. Sequential information maximization: When is greedy\n\nnear-optimal? In COLT, pages 338\u2013363, 2015.\n\n[5] A. Das and D. Kempe. Algorithms for subset selection in linear regression. In STOC, pages 45\u201354, 2008.\n\n[6] A. Das and D. Kempe. Submodular meets spectral: Greedy algorithms for subset selection, sparse\n\napproximation and dictionary selection. In ICML, pages 1057\u20131064, 2011.\n\n[7] G. Davis, S. Mallat, and M. Avellaneda. Adaptive greedy approximations. Constructive Approximation,\n\n13(1):57\u201398, 1997.\n\n[8] G. Diekhoff. Statistics for the Social and Behavioral Sciences: Univariate, Bivariate, Multivariate. William\n\nC Brown Pub, 1992.\n\n[9] E. R. Elenberg, R. Khanna, A. G. Dimakis, and S. Negahban. Restricted strong convexity implies weak\n\nsubmodularity. arXiv:1612.00804, 2016.\n\n[10] U. Feige. A threshold of ln n for approximating set cover. JACM, 45(4):634\u2013652, 1998.\n\n[11] J. Goldenberg, B. Libai, and E. Muller. Talk of the network: A complex systems look at the underlying\n\nprocess of word-of-mouth. Marketing Letters, 12(3):211\u2013223, 2001.\n\n[12] A. Goyal, W. Lu, and L. Lakshmanan. Simpath: An ef\ufb01cient algorithm for in\ufb02uence maximization under\n\nthe linear threshold model. In ICDM, pages 211\u2013220, 2011.\n\n[13] A. Hassidim and Y. Singer. Submodular optimization under noise. In COLT, pages 1069\u20131122, 2017.\n\n[14] T. Horel and Y. Singer. Maximization of approximately submodular functions. In NIPS, pages 3045\u20133053,\n\n2016.\n\n[15] R. A. Johnson and D. W. Wichern. Applied Multivariate Statistical Analysis. Pearson, 6th edition, 2007.\n\n[16] D. Kempe, J. Kleinberg, and \u00c9. Tardos. Maximizing the spread of in\ufb02uence through a social network. In\n\nKDD, pages 137\u2013146, 2003.\n\n[17] A. Miller. Subset Selection in Regression. Chapman and Hall/CRC, 2nd edition, 2002.\n\n[18] G. L. Nemhauser and L. A. Wolsey. Best algorithms for approximating the maximum of a submodular set\n\nfunction. Mathematics of Operations Research, 3(3):177\u2013188, 1978.\n\n[19] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing\n\nsubmodular set functions \u2013 I. Mathematical Programming, 14(1):265\u2013294, 1978.\n\n[20] C. Qian, J.-C. Shi, Y. Yu, and K. Tang. On subset selection with general cost constraints. In IJCAI, pages\n\n2613\u20132619, 2017.\n\n[21] C. Qian, J.-C. Shi, Y. Yu, K. Tang, and Z.-H. Zhou. Parallel Pareto optimization for subset selection. In\n\nIJCAI, pages 1939\u20131945, 2016.\n\n[22] C. Qian, J.-C. Shi, Y. Yu, K. Tang, and Z.-H. Zhou. Optimizing ratio of monotone set functions. In IJCAI,\n\npages 2606\u20132612, 2017.\n\n10\n\n\f[23] C. Qian, Y. Yu, and Z.-H. Zhou. Pareto ensemble pruning. In AAAI, pages 2935\u20132941, 2015.\n\n[24] C. Qian, Y. Yu, and Z.-H. Zhou. Subset selection by Pareto optimization. In NIPS, pages 1765\u20131773, 2015.\n\n[25] C. Qian, Y. Yu, and Z.-H. Zhou. Analyzing evolutionary optimization in noisy environments. Evolutionary\n\nComputation, 2017.\n\n[26] A. Singla, S. Tschiatschek, and A. Krause. Noisy submodular maximization via adaptive sampling with\n\napplications to crowdsourced image collection summarization. In AAAI, pages 2037\u20132043, 2016.\n\n11\n\n\f", "award": [], "sourceid": 2010, "authors": [{"given_name": "Chao", "family_name": "Qian", "institution": "University of Science and Technology of China"}, {"given_name": "Jing-Cheng", "family_name": "Shi", "institution": "Nanjing University"}, {"given_name": "Yang", "family_name": "Yu", "institution": "Nanjing University"}, {"given_name": "Ke", "family_name": "Tang", "institution": "University of Science and Technology of China"}, {"given_name": "Zhi-Hua", "family_name": "Zhou", "institution": "Nanjing University"}]}