 anglais uniquement
EPFL Statistics Seminar
Statistics Seminars 2009  Spring Semester

Prof. Dan Cooley
Colorado State University
Jaunary 15, 2009
A New Parametric Model for Extremes and Prediction via an Angular Density Model
Abstract
The dependence structure of a multivariate extreme value distribution can be characterized by its angular (or spectral) measure. We propose a new flexible parametric model for the angular measure. A benefit of this proposed model over existing parametric models is that its parameter values are interpretable. We then use this new model to explore the prediction problem for extremes. Given that observed values are extreme, how can an unobserved value be predicted? Via the angular measure, we present a method for approximating the conditional density of an unobserved component of a maxstable random vector given the other components of the vector. The approximated conditional density can be used for prediction. We perform prediction for multivariate data and also perform prediction for a spatial process.

Prof. Axel Munk
Universität Göttingen
February 6, 2009
The Estimation of Different Scales in Microstructure Noise Models from the Perspective of a Statistical Inverse Problem
Abstract
In this talk we discuss the problem of estimating the scale function of a Brownian Motion which is additionally corrupted by noise. This is motivated by various models in econometrics where so called microstructure noise has been introduced to model high frequency financial data. We look at this problem from the perspective of a statistical inverse problem and discuss estimators and lower bounds.

Prof. Richard Gill
Universiteit Leiden
February 19, 2009
Statistical Coincidence in Court: the Case of Lucia de Berk
Abstract
The case of Lucia de B. is a highly controversial legal case in the Netherlands, in which a statistically significant correlation between the presence of a particular nurse and the occurrence of suspicious medical incidents on her ward played a central role in getting her a life conviction for serial murder. However, recent reinvestigation of the meagre medical evidence for wrongdoing, and reinvestigation of the statistics, makes it very plausible that no murders were committed at all, by anybody. The Dutch supreme court has recently overturned the conviction and a retrial is about to start.
I will discuss various statistical approaches to analysing the data and argue that given the lack of empirical knowledge about the Ònormal situationÓ on comparable hospital wards, statistics should not have been brought into the court at all Ð at best, they could be seen as an exploratory tool in medical or police investigations.

Prof. Richard Gill
Universiteit Leiden
February 20, 2009
Quantum Statistics: An Introduction
Abstract
Quantum physics is a stochastic theory. When applied at the level of individual microscopic systems (e.g., to observations of a single individual atom  nowadays a standard laboratory exercise), it allows us only to derive probability distributions of outcomes of measurements. By quantum statistics I mean the statistical analysis of data coming from measurements on a quantum system, with the aim of solving inference problems concerning the state of the system. Now, the family of all possible measurement schemes on a quantum system has a very concise and elegant mathematical description. This allows us to optimize the statistics by not just optimizing the processing of the data, but also by optimizing the actual experiment to be performed. The phenomenon of quantum entanglement leads to some surprising results and challenging open problems. I will explain the mathematical framework and outline some recent developments in this field. I hope also to discuss some other roles for statistics in presentday quantum physics, in particular, to the design and analysis of experimental proofs of socalled quantum nonlocality.

Prof. Konstantinos Fokianos
University of Cyprus
February 27, 2009
March 6, 2009
March 20, 2009
March 26, 2009
Short Course on IntegerValued Time Series

Prof. Mike Titterington
University of Glasgow
March 27, 2009
Approximate Inference for Latent Variable Models
Abstract
Likelihood and Bayesian inference are not straightforward for latent variable models, of which mixture models constitute a special case. For instance, in the context of the latter approach, conjugate priors are not available. The talk will consider some approximate methods that have been developed mainly in the machinelearning literature and will attempt to investigate their statistical credentials. In particular, socalled variational methods and the ExpectationPropagation method will be discussed. It will be explained that, in the Bayesian context, variational methods tend produce approximate posterior distributions that are located in the right place but are too concentrated, whereas the ExpectationPropagation approach sometimes, but not always, gets the degree of concentration, as measured by posterior variance, right as well.

Prof. Konstantinos Fokianos
University of Cyprus
April 7, 2009
Linear and Loglinear Poisson Autoregression
Abstract
The talk considers geometric ergodicity and likelihood based inference for linear and loglinear Poisson autoregressions. In the linear case the conditional mean is linked linearly to its past values as well as the observed values of the Poisson process. This also applies to the conditional variance, implying an interpretation as an integer valued GARCH process. In a loglinear conditional Poisson model, the conditional mean is a loglinear function of its past values and a nonlinear function of past observations. Under geometric ergodicity the maximum likelihood estimators the parameters are shown to be asymptotically Gaussian in the linear model. In addition we provide a consistent estimator of the asymptotic covariance, which is used in the simulations and the analysis of some transaction data. Our approach to verifying geometric ergodicity proceeds via Markov theory and irreducibility. Finding transparent conditions for proving ergodicity turns out to be a delicate problem in the original model formulation. This problem is circumvented by allowing a perturbation of the model. We show that as the perturbations can be chosen to be arbitrarily small, the differences between the perturbed and nonperturbed versions vanish as far as the asymptotic distribution of the parameter estimates is concerned.

Prof. Robert Staudte
La Trobe University
May 8, 2009
Beyond Contiguity: a Measure of Evidence for Metaanalysis
Abstract
We advocate calibrating statistical evidence for an alternative hypothesis within the normal location family where it simple to compute, calibrate and interpret. It has applications in most routine problems in statistics, and leads to more accurate confidence intervals, estimated power and hence sample size calculations than standard asymptotic methods. Such evidence is readily combined when obtained from different studies, whether modeled by fixed or random effects. Connections to other approaches to statistical evidence are given, in particular KullbackLeibler Divergence and pvalues.

Prof. Ingram Olkin
Stanford University
May 15, 2009
Multivariate MetaAnalysis
Abstract
Metaanalysis is a methodology for combining the results of independent studies. As studies become more complex, some of the outcomes are correlated. In this talk I will give a short history of some of the key features of metaanalysis, and then describe several models that require an examination of the correlations.

Prof. Alexander Ramm
Kansas State University
May 29, 2009
Random Fields Estimation
Abstract
An analytical theory of random fields estimation by criterion of minimum of variance of the error of the estimate is developed. This theory does not assume a Markovian or Gaussian nature of the random field. The data are the covariance functions of the observed random field of the form u(x) = s(x) + n(x), where s(x) is the "useful signal" and n(x) is noise, and u(x) is observed in a bounded domain D of an r?dimensional Euclidean space, r >=1. One wants to estimate a linear operator A acting on s. For example, if A = I, the identity operator, then one has the filtering problem, etc. Estimation theory seeks an optimal linear estimate Lu, ÓfilterÓ, for which $\overline{Lu  As_2} = min$, where the overline stands for the variance and $Lu := int_D h(x, y)u(y)dy$. For h one gets a multidimensional integral equation of the type (*) Rh := \int_D R(x, y)h(y)dy = f(x), x\inD$. An analytical method for solving the basic integral equation (*) of estimation theory is given, numerical methods for solving this equation are proposed and their efficiency is demonstrated by examples, a singular perturbation theory for (*) is developed, that is, a study of the limiting behavior of the solution to the equation (**) (\epsilon I + R)h_{\epsilon} = f, as $\epsilon\rightarrow\zero$. Statistically this corresponds to the case when the white component in noise tends to zero. The random fields are not assumed homogeneous. Potential applications include multidimensional image processing problems of seismic exploration, demining, filtering of optical images and ocean acoustics. Numerical methods for solving basic integral equation (*) of estimation theory are developed. The difficulty of the numerical solution of (*) lies in the fact that, in general, the solution to (*) is a distribution, and not an integrable function.

Prof. Alexander Samarov
University of Massachusetts and Massachusetts Institute of Technology
July 24, 2009
Noisy Independent Factor Analysis Model for Density Estimation and Classification
Abstract
We consider the problem of multivariate density estimation assuming that the density satisfies a particular form of dimensionality reduction: noisy independent factor analysis (IFA) model. In this model, the data are generated as a linear transformation of a few latent independent components having unknown nonGaussian distributions and are observed in a Gaussian noise. We assume that neither the number of components nor the matrix mixing the components nor the variance of the noise are known.
We show that the density of this form can be estimated with a surprisingly fast rate: using recent results on aggregation of density estimators, we construct an estimator which achieves a nearly parametric rate $\log^{1/4}(n/sqrt{n})$, independent of the dimensionality of the data. One of the main applications of multivariate density estimators is in the supervised learning, where they can be used to construct plugin classifiers. Bounding the excess risk of nonparametric "plugin" classifiers in terms of the MISE of the density estimators of each class, we show that our classifier can achieve, within a logarithmic factor independent of the dimensionality of the data, the best obtainable rate of the excess Bayes risk. Applications of this classifier to simulated data sets and to real data from a remote sensing experiment show promising results.
This is joint work with U. Amato, A. Antoniadis, and A. Tsybakov.
Statistics Seminars 2009  Autumn Semester

Prof. David Brillinger
University of California, Berkeley
September 8, 2009
Assessing Connections in Networks with Point Process Input and Output with a Biological Example
Abstract
Networks with vectorvalued point process input and output are considered, although the techniques apply to ordinary time series as well. Both timeside and frequencyside methods are presented, contrasted and combined. Data from a biological experiment on the muscle spindle are investigated. The experiment involves two input and two output neurons. The analysis combines the results of a time domain approach with those of a frequency domain approach to obtain "much new information about the behaviour of the muscle". The latter work is joint with K. S. Lindsay and J. R. Rosenberg of the University of Glasgow, one of whom I am quoting.

Dr. Marcos Antezana
University of Lisbon
September 22, 2009
A New Strategy to Find MultiColumn Associations in a Data Matrix
Abstract
Values in a data matrix may cooccur nonrandomly along rows. If there is a status column (e.g., height), its association to other columns is of interest. Such associations interest geneticists, insurances, etc., and can have many forms, e.g., linear correlations. Associations of more than two columns may go undetected if only column pairs are examined and often one may wish to evaluate all relevant column subsets. Exhaustive evaluation is impossible when matrices are large, e.g., 100 columns give ~10^30 distinct subsets. And even when there are few columns, model selection and multiple testing are complex. I have developed singlecolumn indices that flag in a markervalue matrix every column that is significantly associated to other columns. Such "PAS" indices find everything exhaustive searches find, and they bypass model selection by flagging each significant column directly (but one can check later how flagged columns are associated). PAS calculations require no examination of column subsets, avoiding the combinatorial explosion and allowing one to obtain nonparametrically the c.d.f. of each column's PAS under the null hypothesis. Subtracting the indices of statusdefined row groups erases background associations unrelated to status (e.g., chromosomal linkage). The exhaustiveevaluation analogue of PAS is the sum of all the "pure" chisquare from the column subsets involving the given column, and has very similar performance. PAS indices have therefore nearly correct typeIerror generation, almost ultimate power, and correct directly for multiple tests. Earlystage PAS indices for matrices with arbitrarily distributed measurements (e.g., gene expressions) show similar properties and promise to revolutionize ANOVAs, etc.

Prof. Ilya Molchanov
Universität Bern
September 24, 2009
Multivariate risks and depthtrimmed regions
Abstract
The talk starts with a reminder of basic facts for univariate coherent risk measures. Then we outline some inherent difficulties that appear when attempting to define risks for multivariate portfolios. Finally, it will be explained how to come up with a nontrivial risk measure for multivariate risks and how to relate it to the concept of depth and depthtrimmed regions in multivariate statistics.
Based on joint work with I. Cascos (Madrid).

Prof. Matt Wand
University of Wollongong
October 26, 2009
Variational Approximation in Semiparametric Regression
Abstract
Variational approximations are a body of analytic procedures for handling difficult probability calculus problems. They have been used extensively in Statistical Physics and Computer Science. Variational approximations offer an alternative to Markov chain Monte Carlo methods and have the advantage of being faster and not requiring convergence diagnoses, albeit with some loss in accuracy. Despite the growing literature on variational approximations, they currently have little presence in mainstream Statistics. We describe recent work on the transferral and adaptation of variational approximation methodology to contemporary Statistics settings such as generalised linear mixed models and semiparametric regression. This talk represents joint research with and Professor Peter Hall and Dr John T. Ormerod.
Even though this talk is on the speaker's traditional research areas of semiparametric regression generalised linear mixed models, the ideas of variational approximation are applicable to many areas of Statistics. Examples are: Bayesian inference, missing data models, longitudinal data analysis, image analysis, hidden Markov models and latent variable models.

Swiss Statistics Meeting
October 2830, 2009
Click Here for the Event Programme
LIST OF SPEAKERS:

Prof. Arnoldo Frigessi
University of Oslo
November 6, 2009
PairCopula Constructions of Multiple Dependence
Abstract
We show how multivariate data, which exhibit complex patterns of dependence, can be modelled using a cascade of paircopulae, acting on two (transformed) variables at a time. We discuss inference and is properties. The model construction is hierarchical in nature, the various levels corresponding to the incorporation of more variables in the conditioning sets, using paircopulae as simple building blocks. By examples, we discuss how well a general multivariate density can be approximate by a pair copulae construction. Our approach might be a first step towards the development of an unsupervised algorithm that explores the space of possible paircopula models. Time permitting we will survey recent work by other authors on pair copulae constructions, including theory on tail dependence (Joe) and Bayesian versions (Czado, Min).
Based on joint work with Ingrid H. Haff, Kjersti Aas and Claudia Czado.

Prof. Noureddine El Karoui
University of California, Berkeley
24 November, 2009
HighDimensionality Effects in the Markowitz Problem and other Quadratic Programs with Linear Equality Constraints: Risk Underestimation
Abstract
In this talk, we consider a very particular subcase of the following general question: given an optimization problem whose parameters are estimated from data, what can we say about the solution of this "empirical" optimization problem as compared to the solution we would get if we knew the real value of the parameters, i.e if we had the "population" solution?
We will focus on the case where the optimization problem is a simple quadratic program with linear equality constraints, i.e we are minimizing a quadratic form subject to linear equality constraints. Also, we will be working in a "large n, large p" setting, where the number of variables in the problem is of the same order of magnitude as the number of observations used to estimate the parameters.
We will discuss the fact that the empirical solution of our problem tends to produce an underestimate of the value of the optimal population solution, under a variety of distributional settings for the data. This underestimate is driven by various population quantities and, very importantly, by the ratio p/n. We will also discuss asymptotically consistent estimators of the population optimum.
Random matrix theory plays a key role in our solution. We will discuss its role, focusing particularly on the intuition it provides.
The talk will be selfcontained.

Prof. Finbarr O'Sullivan
University College Cork
December 4, 2009
Statistical Aspects of Imaging Cancer and its Response to Therapy with Positron Emission Tomography
Abstract
Radiotracer imaging with PET is playing an increasingly important role in both basic science and clinically orientated studies of cancer and other diseases. But the information retrievable from a PET ratiotracer study is fundamentally limited by the confounding of local delivery (vasculature) and doserelated resolution constraints. As a result there is a compelling role for Statistics. Perhaps more so than for MR or CT, which typically provide exquisite anatomic but limited metabolic information. This talk will discuss some current areas of interest with static (3D) and dynamic (4D) PET imaging. The focus is on studies and data from a 20year NCI funded PET cancer imaging program at the University of Washington. This program has attempted to develop invivo markers of progression and response to treatment in range of major human cancers.

Dr. MarieColette van Lieshout
Centrum Wiskunde & Informatica (CWI) and Eindhoven University of Technology
December 11, 2009
Moment Analysis of the Delaunay Tessellation Field Estimator
Abstract
Estimation of the intensity function of spatial point processes is a fundamental problem. In this talk, we shall use the CampbellMecke theorem to derive explicit expressions for the mean and variance of the Delaunay tessellation field estimator recently introduced by Schaap and Van de Weygaert and compare it to the classic kernel estimators. Special attention will be paid to Poisson processes.
Seminar Speakers 2009
 Prof. Dan Cooley
 Prof. Axel Munk
 Prof. Richard Gill
 Prof. Konstantinos Fokianos
 Prof. Mike Titterington
 Prof. Robert Staudte
 Prof. Ingram Olkin
 Prof. Alexander Ramm
 Prof. Alexander Samarov
 Prof. David Brillinger
 Dr. Marcos Antezana
 Prof. Ilya Molchanov
 Prof. Matt Wand
 Prof. Arnoldo Frigessi
 Prof. Noureddine El Karoui
 Prof. Finbarr O'Sullivan
 Dr. MarieColette van Lieshout
Past Statistics Seminars
 Statistics Seminar
 Seminars held during 2018
 Seminars held during 2017
 Seminars held during 2016
 Seminars held during 2015
 Seminars held during 2014
 Seminars held during 2013
 Seminars held during 2012
 Seminars held during 2011
 Seminars held during 2010
 Seminars held during 2009
 Seminars held during 2008
 Seminars held during 2007
 Seminars held from 1995 through 2006