TechTalks from event: Object, functional and structured data: towards next generation kernel-based methods - ICML 2012 Workshop

  • Object-oriented data analysis Authors: Steve Marron
    Object Oriented Data Analysis is the statistical analysis of populations of complex objects. In the special case of Functional Data Analysis, these data objects are curves, where standard Euclidean approaches, such as principal components analysis, have been very successful. Challenges in modern medical image analysis motivate the statistical analysis of populations of more complex data objects which are elements of mildly non-Euclidean spaces, such as Lie Groups and Symmetric Spaces, or of strongly non-Euclidean spaces, such as spaces of tree-structured data objects. These new contexts for Object Oriented Data Analysis create several potentially large new interfaces between mathematics and statistics. The notion of Object Oriented Data Analysis also impacts data analysis, through providing a language for discussion of the many choices needed in many modern complex data analyses. Even in situations where Euclidean analysis makes sense, there are statistical challenges because of the High Dimension Low Sample Size problem, which motivates a new type of asymptotics leading to non-standard mathematical statistics.
  • Nonlinear relationship and prediction in high dimensional regression setting Authors: Frdric Ferraty
    The high dimensional setting is a modern and dynamic research area in Statistics. It covers numerous situations where the number of explanatory variables is much larger than the sample size. Last fifteen years have been devoted to develop new methodologies able to manage high dimensional data including the so-called functional data (which can be viewed as a special case of high dimensional data with a high correlated structure). Statistical modelling involving only linear relationship have been essentially studied. However, it is well known in the nonparametrician communauty that taking into account nonlinearities may improve significantly the predictive power of the statistical methods and also may reveal relevant informations allowing to better understand the observed phenomenon. This talk presents recent advances with respect to both issues : functional nonparametric approach to estimate nonlinear relationship involving functional data (through functional kernel regression estimator) and multivariate approach to propose nonlinear variable selection method in high dimensional setting. Some simulations and real datasets will illustrate our purpose.
  • Kernels for vector-valued functions Authors: Neil Lawrence
    In this talk we review kernels for vector valued functions from the perspective of Gaussian processes. Deriving a multiple output Gaussian process from the perspective of a linear dynamical system (Kalman Filter) we introduce the Intrinsic Coregionalization Model and the Linear Model of Coregionalization. We discuss how they relate to multi-task learning with GPs and the Semi Parametric Latent Factor model. Finally we will introduce convolutional process models from the perspective of the latent force model.
  • RKHS embeddings of distributions: theory and applications Authors: Arthur Gretton
    In the early days of kernel machines research, the "kernel trick" was considered a useful way of constructing nonlinear learning algorithms from linear ones, by applying the linear algorithms to feature space mappings of the original data. Recently, it has become clear that a potentially more far reaching use of kernels is as a linear way of dealing with higher order statistics, by mapping probabilities to a suitable reproducing kernel Hilbert space (i.e., the feature space is an RKHS). I will describe how probabilities can be mapped to kernel feature spaces, and how to compute the distance between these mappings. This distance is called the maximum mean discrepancy (MMD), and is a metric on distributions for kernels that satisfy the characteristic property. A measure of dependence between two random variables follows naturally from this distance. The focus will be mainly on the application of the MMD to two-sample and independence testing in high dimensional and structured domains. I will also briefly cover embeddings of conditional distributions and their application in inference.