FOCS 2011
TechTalks from event: FOCS 2011
We will be uploading the videos for FOCS 2011 during the week of Nov 28th 2011. If you find any discrepancy, please let us know by clicking on report error link on talk page. If you did not permit the video to be published and by mistake we have published your talk, please notify us immediately at support AT weyond.com
6B

The Power of Linear EstimatorsFor a broad class of practically relevant distribution properties, which includes entropy and support size, nearly all of the proposed estimators have an especially simple form. Given a set of independent samples from a discrete distribution, these estimators tally the vector of summary statisticsthe number of species seen once, twice, etc. in the sampleand output the dot product between these summary statistics, and a fixed vector of coefficients. We term such estimators \emph{linear}. This historical proclivity towards linear estimators is slightly perplexing, since, despite many efforts over nearly 60 years, all proposed such estimators have significantly suboptimal convergence. Our main result, in some sense vindicating this insistence on linear estimators, is that for any property in this broad class, there exists a nearoptimal linear estimator. Additionally, we give a practical and polynomialtime algorithm for constructing such estimators for any given parameters. While this result does not yield explicit bounds on the sample complexities of these estimation tasks, we leverage the insights provided by this result, to give explicit constructions of a linear estimators for three properties: entropy, $L_1$ distance to uniformity, and for pairs of distributions, $L_1$ distance.Our entropy estimator, when given $O(\frac{n}{\eps \log n})$ independent samples from a distribution of support at most $n,$ will estimate the entropy of the distribution to within accuracy $\epsilon$, with probability of failure $o(1/poly(n)).$ From recent lower bounds, this estimator is optimal, to constant factor, both in its dependence on $n$, and its dependence on $\epsilon.$ In particular, the inverselinear convergence rate of this estimator resolves the main open question of [VV11], which left open the possibility that the error decreased only with the square root of the number of samples. Our distance to uniformity estimator, on given $O(\frac{m}{\eps^2\log m})$ independent samples from any distribution, returns an $\eps$accurate estimate of the $L_1$ distance to the uniform distribution of support $m$. This is the first sublinearsample estimator for this problem, and is constantfactor optimal, for constant $\epsilon$. Finally, our framework extends naturally to properties of pairs of distributions, including estimating the $L_1$ distance and KLdivergence between pairs of distributions. We give an explicit linear estimator for estimating $L_1$ distance to accuracy $\epsilon$ using $O(\frac{n}{\eps^2\log n})$ samples from each distribution, which is constantfactor optimal, for constant $\epsilon$.

An algebraic proof of a robust social choice impossibility theoremAn important element of social choice theory are impossibility theorem, such as Arrow's theorem and GibbardSatterthwaite's theorem, which state that under certain natural constraints, social choice mechanisms are impossible to construct. In recent years, beginning in Kalai, much work has been done in finding \textit{robust} versions of these theorems, showing that impossibility remains even when the constraints are \textit{almost} always satisfied. In this work we present an Algebraic approach for producing such results. We demonstrate it for a lesser known variant of Arrow's theorem, found in Dokow and Holzman.

Planar Graphs: Random Walks and Bipartiteness TestingWe initiate the study of the testability of properties in arbitrary planar graphs. We prove that bipartiteness can be tested in constant time. The previous bound for this class of graphs was Otilde(sqrt(n)), and the constanttime testability was only known for planar graphs with bounded degree. Previously used transformations of unboundeddegree sparse graphs into boundeddegree sparse graphs cannot be used to reduce the problem to the testability of boundeddegree planar graphs. Our approach extends to arbitrary minorfree graphs. Our algorithm is based on random walks. The challenge is here to analyze random walks for a class of graphs that has good separators, i.e., bad expansion. Standard techniques that use a fast convergence to a uniform distribution do not work in this case. Roughly speaking, our analysis technique selfreduces the problem of Ã¯Â¬ nding an odd length cycle in a multigraph G induced by a collection of cycles to another multigraph GÃ¢â‚¬Â² induced by a set of shorter oddlength cycles, in such a way that when a random walks Ã¯Â¬ nds a cycle in GÃ¢â‚¬Â² with probability p>0, then it does so with probability lambda(p)>0 in G. This reduction is applied until the cycles collapse to selfloops that can be easily detected.

Testing and Reconstruction of Lipschitz Functions with Applications to Data PrivacyA function f : D > R has Lipschitz constant c if dR(f(x), f(y)) <= c dD(x, y) for all x, y in D,where dR and dD denote the distance functions on the range and domain of f, respectively. We say a function is Lipschitz if it has Lipschitz constant 1. (Note that rescaling by a factor of 1=c converts a function with a Lipschitz constant c into a Lipschitz function.) In other words, Lipschitz functions are not very sensitive to small changes in the input. We initiate the study of testing and local reconstruction of the Lipschitz property of functions. A property tester has to distinguish functions with the property (in this case, Lipschitz) from functions that are epsilonfar from having the property, that is, differ from every function with the property on at least an epsilon fraction of the domain. A local filter reconstructs an arbitrary function f to ensure that the reconstructed function g has the desired property (in this case, is Lipschitz), changing f only when necessary. A local filter is given a function f and a query x and, after looking up the value of f on a small number of points, it has to output g(x) for some function g, which has the desired property and does not depend on x. If f has the property, g must be equal to f. We consider functions over domains {0,1}^d, {1,...,n} and {1,...,n}^d, equipped with l1 distance. We design efficient testers of the Lipschitz property for functions of the form f:{0,1}^d > \delta Z, where \delta \in (0,1] and \delta Z is the set of multiples of \delta, and of the form f: {1,...,n} > R, where R is (discretely) metrically convex. In the first case, the tester runs in time O(d min{d,r}/\delta\epsilon), where r is the diameter of the image of f; in the second, in time O((\log n)/\epsilon). We give corresponding lower bounds of Omega(d) and Omega(log n) on the query complexity (in the second case, only for nonadaptive 1sided error testers). Our lower bound for functions over {0,1}^d is tight for the case of the {0,1,2} range and constant \epsilon. The first tester implies an algorithm for functions of the form f:{0,1}^d > R that distinguishes Lipschitz functions from functions that are \epsilonfar from (1+\delta)Lipschitz. We also present a local filter of the Lipschitz property for functions of the form f: {1,...,n}^d > R with lookup complexity O((log n+1)^d). For functions of the form {0,1}^d, we show that every nonadaptive local filter has lookup complexity exponential in d. The testers that we developed have applications to programs analysis. The reconstructors have applications to data privacy. For the first application, the Lipschitz property of the function computed by a program corresponds to a notion of robustness to noise in the data. The application to privacy is based on the fact that a function f of entries in a database of sensitive information can be released with noise of magnitude proportional to a Lipschitz constant of f, while preserving the privacy of individuals whose data is stored in the database (Dwork, McSherry, Nissim and Smith, TCC 2006). We give a differentially private mechanism, based on local filters, for releasing a function f when a Lipschitz constant of f is provided by a distrusted client. We show that when no reliable Lipschitz constant of f is given, previously known differentially private mechanisms either have a substantially higher running time or have a higher expected error for a large class of symmetric functions f.