• anglais uniquement

# EPFL Statistics Seminar

### Statistics Seminar 2011

• #### Prof. Jozef Teugels [Joint EPFL/UNIL Statistics Colloquium]

Katholieke Universiteit Leuven & EURANDOM
President, International Statistical Institute
February 11, 2011
14.15 - Room 126, Extranef (UNIL)

Change Point Analysis of Extreme Values

#### Abstract

Abstract: In a sample from the distribution of a random variable, it is possible that the tail behavior of the distribution changes at some point in the sample. This tail behavior can be described by absolute or relative excesses of the data over a high threshold, given that the random variable exceeds the threshold. The limit distribution of the absolute excesses is given by a Generalized Pareto Distribution with an extremal parameter gamma and a scale parameter sigma. When the extreme value index gamma is positive, then the relative excesses can be described in the limit by a Pareto distribution with this index as parameter. In this lecture we concentrate on testing whether changes occur in the value of the extreme value index gamma and/or the scale parameter sigma. To this end, appropriate test statistics are introduced based on the likelihood approach of Csorgo and Horvath (1997) for independent data. Asymptotic properties of these test statistics lead to adequate critical values so that a practical test procedure can be formulated. Supported by the outcome of some simulations, we spend a major portion of the seminar on real life examples. We begin with stock index data and the classical set of Nile data. Since we are not directly successful in applying the procedure to catastrophic losses, we investigate whether or not a trend analysis might not be more appropriate. (Joint work with G. Dierckx)

• #### Dr. Johanna Ziegel

Universität Heidelberg
March 4, 2011
15.15 - MA 31

Precision Estimation for Stereological Volumes

#### Abstract

Volume estimators based on Cavalieri's principle are widely used in the biosciences. For example in neuroscience, where volumetric measurements of brain structures are of interest, systematic samples of serial sections are obtained by magnetic resonance imaging or by a physical cutting procedure. The volume v is then estimated by $\hat{v}$, which is the sum over the areas of the structure of interest in the section planes multiplied by the width of the sections, t>0. Assessing the precision of such volume estimates is a question of great practical importance, but statistically a challenging task due to the strong spatial dependence of the data and typically small sample sizes. In this talk an overview of classical and new approaches to this problem will be presented. A special focus will be given to some recent advances on distribution estimators and confidence intervals for $\hat{v}$.

• #### Prof. Catalin Starica

Université de Neuchâtel
April 12, 2011
15.15 - CM1113

The Facts Behind Sector Rotation

#### Abstract

The conventional view of sector rotation represents the belief that investing in certain sectors at different stages of the business cycle can deliver superior returns relative to a purely passive strategy. The preferred sectors through the various stages of the business cycle are illustrated by the following diagram. In this talk we take a close look at the econometric facts behind this conventional view. While we do not test whether actual sector rotation works, we test the fundamental assumptions underlying sector rotation. Do sector returns differ significantly across the business cycle?

• #### Prof. Hansjörg Albrecher

Université de Lausanne
May 11, 2011
15.15 - MA12

On Refracted Stochastic Processes and the Analysis of Insurance Risk

#### Abstract

We show a somewhat surprising identity for first passage probabilities of spectrally-negative Levy processes that are refracted at their running maximum and discuss extensions of this identity and its applications in the study of insurance risk processes in the presence of tax payments. In addition, we discuss a statistic that is related to the sample coefficient of variation which leads to an alternative simple method for estimating the extreme value index of Pareto-type tails from corresponding iid claim data with infinite variance.

• #### Prof. Tilmann Gneiting

Universität Heidelberg and University of Washington, Seattle
May 13, 2011
15.15 - MA31

Probabilistic Weather Forecasting

#### Abstract

A major human desire is to make forecasts for an uncertain future. Consequently, forecasts ought to be probabilistic in nature, taking the form of probability distributions over future quantities or events. At this time, the meteorological community is taking massive steps in a reorientation towards probabilistic weather forecasting. This is typically done using a numerical weather prediction model, perturbing the inputs to the model (initial conditions and physics parameters) in various ways, and running the model for each perturbed set of inputs. The result is then viewed as an ensemble of forecasts, taken to be a sample from the joint probability distribution of future weather quantities of interest. However, forecast ensembles typically are biased and uncalibrated, and thus there is a pronounced need for statistical postprocessing, with Bayesian model averaging and heterogeneous regression being state of the art methods for doing this. Many challenges remain, both theoretically and practically, particularly in the postprocessing of spatio-temporal weather field forecasts, where copula methods are in critical demand.

• #### Dr. Ioanna Manolopoulou

Duke University
May 20, 2011
15.15 - MA 31

Semi-Parametric Bayesian Modeling of Inhomogeneous Tactic Fields in Single-Cell Motility

#### Abstract

We develop dynamic models of single cell motion involving nonparametric representations of nonlinear spatial ﬁelds that guide cellular motility. Assuming a discretized diffusion model for the cell motion, the tactic field is flexibly modelled using radial basis kernel regression. Our methods are motivated by the temporal dynamics of lymphocytes in the lymph nodes, critical to the immune response. The primary goal is learning the structure of the tactic ﬁelds that fundamentally characterize the immune cell motion. We develop Bayesian analysis via customized Markov chain Monte Carlo methods for single cell models, and multi-cell hierarchical extensions for aggregating models and data across multiple cells. Our implementation explores data from multi-photon vital microscopy in murine lymph node experiments, and we use a number of visualization tools to summarize and compare posterior inferences on the 3−dimensional tactic ﬁelds.
(Joint work with Melanie P. Matheu, Michael D. Cahalan, Mike West and Thomas B. Kepler)

• #### Dr. Nicholas Hengartner

Los Alamos National Laboratory (USA)
June 10, 2011
15.15 - MA31

Iterative Bias Corrections Schemes, or “How Adaptive Fully Nonparametric Smoothing is Practical in Moderate Dimensions”

#### Abstract

Adaptive smoothing makes fully nonparametric multivariate regression possible in moderate dimensions. To wit, randomly distributed points in higher dimensions are well separated. This leads to a curse of dimensionality for nonparametric smoothing, as one needs to take ever larger neighborhoods over which to make local averages (smoothing). This curse of dimensionality is partially mitigated in practice if we knew that the underlying multivariate regression function is very smooth. Unfortunately, we often don’t know a priori how smooth the regression function; so smoothing methods that can adapt to that smoothness are desirable. This talk will present a simple strategy: first oversmooth the data and then estimate the bias to correct the original smoother. The surprise is that if this scheme is iterated, we can turn garden variety smoothers (say kernel based smoothers and smoothing spline) into adaptive smoothers. The method works well in practice and can be easily be used with ten or more covariates, as I will show via simulations and application of the method to standard test datasets. We have developed an R contributed package (ibr) that is optimized for kernel smoothers and thin plate smoothing splines that is computationally optimized for moderate sample sizes (less than 1000 observations).

• #### Prof. Wenceslao González Manteiga

Universidad de Santiago de Compostela, Spain
September 1, 2011
15.15 - MA10

General Views of Goodness of Git for Statistical Models

#### Abstract

The term goodness-of-fit was introduced by Pearson at the beginning of the last century and refers to statistical tests which check how a distribution fits to a data set in an omnibus way. Since then, many papers were devoted to the chi-square test, the Kolmogorov-Smirnov test and other related methods. The pilot function used for testing was mainly the empirical distribution function. In the last twenty years, there has been an explosion of works that extended the goodness-of-fit ideas to other types of functions: density function, regression function, hazard rate function, etc. Perhaps many of them were motivated for the seminal paper by Bickel and Rosenblatt [1] devoted to density estimation. In this talk, we will give some modern unified approaches of the general goodness-of-fit theory, illustrating their behaviour by means of applications in topics of great interest: testing for interest rate models, testing the inuence of variables in general additive models (for example, variables related with pollutant concentrations), testing correlation structures in spatial statistics, or testing wind direction distributions in weather stations.

[1] Bickel, P. and Rosenblatt, N. (1973). On some global measures of the deviations of density function estimates. Ann. Stat., 1, 1071

• #### Prof. Emmanuel Candes

Stanford University
September 13, 2011
17.15 - MA11

Robust Principal Component Analysis? Some Theory and Some Applications

#### Abstract

This talk is about a curious phenomenon. Suppose we have a data matrix, which is the superposition of a low-rank component and a sparse component. Can we recover each component individually? We prove that under some suitable assumptions, it is possible to recover both the low-rank and the sparse components exactly by solving a very convenient convex program. This suggests the possibility of a principled approach to robust principal component analysis since our methodology and results assert that one can recover the principal components of a data matrix even though a positive fraction of its entries are arbitrarily corrupted. This extends to the situation where a fraction of the entries are missing as well. In the second part of the talk, we present applications in computer vision. In video surveillance, for example, our methodology allows for the detection of objects in a cluttered background. We show how the methodology can be adapted to simultaneously align a batch of images and correct serious defects/corruptions in each image, opening new perspectives. Joint work with X. Li, Y. Ma and J. Wright.

• #### Prof. Peter Green

University of Bristol
September 23, 2011
15.15 - MA30

Identifying Influential Model Choices in Bayesian Hierarchical Models

#### Abstract

Real-world phenomena are frequently modelled by Bayesian hierarchical models. The building-blocks in such models are the distribution of each variable conditional on parent and/or neighbour variables in the graph. The specifications of centre and spread of these conditional distributions may be well-motivated, while the tail specifications are often left to convenience. However, the posterior distribution of a parameter may depend strongly on such arbitrary tail specifications. This is not easily detected in complex models. In this paper we propose a graphical diagnostic which identifies such influential statistical modelling choices at the node level in any chain graph model. Our diagnostic, the local critique plot, examines local conflict between the information coming from the parents and neighbours (local prior) and from the children and co-parents (lifted likelihood). It identifies properties of the local prior and the lifted likelihood that are influential on the posterior density. We illustrate the use of the local critique plot with applications involving models of different levels of complexity. The local critique plot can be derived for all parameters in a chain graph model, and is easy to implement using the output of posterior sampling.
This is joint work with Ida Scheel (Oslo) and Jonathan Rougier (Bristol).

• #### Dr. Carl Scarrott

University of Canterbury
November 10, 2011
15.15 - CM 1 113

Non-stationary Extreme Value Mixture Modelling with Application to Pollution Modelling

#### Abstract

This seminar will discuss a semi-parametric modeling approach to determine the "threshold" beyond which traditional asymptotically motivated extreme value models provide a reliable approximation to the tail. Our semi-parametric mixture model incorporates the usual extreme value upper tail model, with the threshold as a parameter and the bulk distribution below the threshold captured by a flexible non-parametric kernel density estimator. This representation avoids the need to specify a-priori a particular parametric model for the bulk distribution, and only really requires the trivial assumption of a suitably smooth density. Bayesian inference is used to estimate the joint posterior for the threshold, extreme value tail model parameters and the kernel density bandwidth, allowing the uncertainty associated with all components to be accounted for in inferences. The focus of the talk will be on extension of this mixture model to describe the extremes of non-stationary processes. This approach includes automated estimation of (non-constant) threshold functions and formal assessment of the corresponding uncertainty, providing an important step forward compared to alternative approaches in the literature. The results from simulations, comparison to alternate approaches and application to air pollution modeling will be presented. This is joint work with Anna MacDonald and Dominic Lee.

• #### Prof. Yanyuan Ma

Texas A&M University
November 18, 2011
15.15 - MA30

A Semiparametric Approach to Dimension Reduction

#### Abstract

We provide a novel and completely different approach to dimension reduction problems from the existing literature. We cast the dimension reduction problem in a semiparametric estimation framework and derive estimating equations. Viewing this problem from the new angle allows us to derive a rich class of estimators, and obtain the classical dimension reduction techniques as special cases in this class. The semiparametric approach also reveals that the common assumption of linearity and/or constant variance on the covariates can be removed at the cost of performing additional nonparametric regression. The semiparametric estimators without these common assumptions are illustrated through simulation studies and a real data example.

• #### Prof. Marc Genton

Texas A&M University
November 25, 2011
15.15 - MA30

Functional Boxplots for Visualization of Complex Curve/Image Data: An Application to Precipitation and Climate Model Output

#### Abstract

In many statistical experiments, the observations are functions by nature, such as temporal curves or spatial surfaces/images, where the basic unit of information is the entire observed function rather than a string of numbers. For example the temporal evolution of several cells, the intensity of medical images of the brain from MRI, the spatio-temporal records of precipitation in the U.S., or the output from climate models, are such complex data structures. Our interest lies in the visualization of such data and the detection of outliers. With this goal in mind, we have defined functional boxplots and surface boxplots. Based on the center outwards ordering induced by band depth for functional data or surface data, the descriptive statistics of such boxplots are: the envelope of the 50% central region, the median curve/image and the maximum non-outlying envelope. In addition, outliers can be detected in a functional/surface boxplot by the 1.5 times the 50% central region empirical rule, analogous to the rule for classical boxplots. We illustrate the construction of a functional boxplot on a series of sea surface temperatures related to the El Nino phenomenon and its outlier detection performance is explored by simulations. As applications, the functional boxplot is demonstrated on spatio-temporal U.S. precipitation data for nine climatic regions and on climate general circulation model (GCM) output. Further adjustments of the functional boxplot for outlier detection in spatio-temporal data are discussed as well. The talk is based on joint work with Ying Sun.

• #### Prof. Filipe Marques

December 2, 2011
15.15 - MA30

How to Develop Near-Exact Distributions for the Distribution of Likelihood Ratio Test Statistics Used to Test the Structure of Covariance Matrices.

#### Abstract

The exact distribution of the most common likelihood ratio test statistics in multivariate statistics, that is, the ones used to test the independence of several sets of variables, the equality of several variance-covariance matrices, sphericity and the equality of several mean vectors, may be expressed as the distribution of the product of independent Beta random variables or as the product of a given number of independent random variables whose logarithm has a Gamma distribution times a given number of independent Beta random variables. What is interesting is that the similarities exhibited by the distributions of these statistics may be used to develop near-exact distributions for statistics used to test more elaborated structures of covariance matrices. This is possible to be done whenever we are able to split the original null hypothesis into two or more sub-hypotheses which are the basis for a set of independent tests. Examples are presented to illustrate these results. Numerical results show the quality and properties of these approximations.

### Visitor Information

Directions for visitors

### Mailing List

Please email Ms. Schaffner if you would like to be added to the seminar mailing list.