• anglais uniquement

# EPFL Statistics Seminar

### Statistics Seminars - Spring 2007

• #### Prof. Angelika May

University of Siegen
March 12, 2007

Copula Functions as a Tool for Modelling Dependent Data

#### Abstract

The talk will give an introduction into the concept of copula functions, focussing on the separate statistical treatment of the marginal distribution functions and the dependence concept. We discuss several measures for the strenght of dependence beteween data. In financial and actuarial applications, some data show asymmteric dependence in the tails. If this is the case, a transformed copula within the class of Archimedean copulas seem to be an appropriate choice. Despite of nice analytical properties that can easily be deduced, we will show that this approach causes problems for higher dimensions.

• #### Prof. Elizabeth Smith

University of Newcastle
March 23, 2007

A Max-Stable Process Model for Spatial Extremes

#### Abstract

The extremes of environmental processes are often of interest due to the damage that can be caused by extreme levels of the processes. These processes are often spatial in nature and modelling the extremes jointly at many locations can be important. A model which enables data from a number of locations to be modelled under a more flexible framework than in previous applications will be discussed. The model is applied to annual maximum rainfall data from five sites in South-West England. A pairwise likelihood is used for estimation and a Bayesian analysis is employed, allowing the incorporation of informative prior information.

• #### Dr. Wolfgang Huber

European Bioinformatics Institute/European Molecular Biology Laboratory
April 4, 2007

Automated Image Analysis for High-Throughput Cell-Based Microscopy Assays with R and Bioconductor

#### Abstract

Advances in automated microscopy have made it possible to conduct large scale cell-based assays with image-type phenotypic readouts. In such an assay, cells are grown in the wells of a microtitre plate or on a glass slide under a condition or stimulus of interest. Each well is treated with one of the reagents from the screening library and the response of the cells is monitored, for which in many cases certain proteins of interest are antibody-stained or labeled with a GFP-tag. The resulting data can be in the form of two-dimensional (2D) still images, 3D image stacks or image-based time courses. RNA interference (RNAi) libraries can be used to screen a set of genes (in many cases the whole genome) for the effect of their loss of function in a certain biological process. I will talk about some of the statistical and data analytic issues, and the tools in Bioconductor, in particular, the cellHTS, prada and EBImage packages.

• #### Dr. Robert Gentleman

Fred Hutchinson Cancer Research Center
April 4, 2007

Assessing the Role Played by Multi-Protein Complexes in Determining Phenotype

#### Abstract

While proteins are the primary mechanism by which cells carry out the various molecular processes needed for life, it is also true that proteins seldom act alone. Rather they often form multi-protein complexes that carry out particular functions. Using published, known multi-protein complexes, and pathways, in yeast we develop a number of statistical approaches to help elucidate the involvement of different levels of organization on observed changes in phenotype that arise from single gene manipulation experiments (deletion, mutation, up-regulation etc).

• #### Prof. P. R. Parthasarathy

June 1, 2007

Stochastic Models of Carcinogenesis

University College London
June 8, 2007

Addressing Uncertainty in Numerical Climate Models

#### Abstract

IIt is recognised that projections of future climate can differ widely between climate models and it is therefore necessary to account for climate model uncertainty in any risk assessment exercise. Here we suggest that a hierarchical statistical model, implemented in a Bayesian framework, provides a logically coherent and interpretable way to think about climate model uncertainty in general. The ideas will be illustrated by considering the generation of future daily rainfall sequences at a single location in the UK, based on the outputs of four different climate models under the SRES A2 emissions scenario.

• #### Prof. P. R. Parthasarathy

June 8, 2007

Applied Birth and Death Models

• #### Prof. P. R. Parthasarathy

June 22, 2007

Exact Transient Solution of State-Dependent Queues

### Statistics Seminars - Autumn 2007

• #### Prof. Jon Forster

University of Southampton
October 5, 2007

Bayesian Inference for Multivariate Ordinal Data

#### Abstract

Methods for investigating the structure in contingency tables are typically based on determining appropriate log-linear models for the classifying variables. Where one or more of the variables is ordinal, such models do not take this property into account. In this talk, I describe how the multivariate probit model (Chib and Greenberg, 1998) can be adapted so that ordinal data models can be compared using Bayesian methods. By a suitable choice of parameterisation, the conditional posterior distributions are standard and are easily simulated from, and reversible jump Markov chain Monte Carlo computation can be used to estimate posterior model probabilities for undirected decomposable graphical models. The approach is illustrated with various examples.

• #### Dr. Parthanil Roy

ETH Zurich
November 2, 2007

Ergodic Theory, Abelian Groups, and Point Processes Associated with Stable Random Fields

#### Abstract

We consider the point process sequence $\big\{\sum_{\|t\|_\infty \leq n} \delta_{b_n^{-1}X_t}:\, n \geq 1 \big\}$ induced by a stationary symmetric $\alpha$-stable $(0 < \alpha < 2)$ discrete parameter random field $\{X_t\}_{t \in \mathbb{Z}^d}$ for a suitable choice of scaling sequence $b_n \uparrow \infty$. It is easy to prove, following the arguments in the one-dimensional case in Resnick and Samorodnitsky (2004), that if the random field is generated by a dissipative $\mathbb{Z}^d$-action then $b_n=n^{d/\alpha}$ is appropriate and with this choice the above point process sequence converges weakly to a cluster Poisson process. For the conservative case, no general result is known even when $d=1$. In this talk, we look at a specific class of stable random fields generated by conservative actions for which the effective dimension $p \leq d$ can be computed using the structure theorem of finitely generated abelian groups and some basic counting techniques. For this class of random fields, in order to incorporate the clustering effect of extreme observations due to longer memory, we need to normalize the point process itself in addition to using a scaling sequence $b_n =n^{p/\alpha}$. The weak limit of this normalized point process happens to be a random measure but not a point process. A number of limit theorems for various functionals of the random field can be obtained by continuous mapping arguments from these weak convergence results. (This talk is based on a joint work with Gennady Samorodnitsky.)

• #### Prof. Stephen Stigler

University of Chicago
November 20, 2007

The 350th Anniversary of the Birth of Probability

#### Abstract

The first printed work in probability was published in 1657 by Christian Huygens. A part of that work is discussed and its connections with modern ideas of risk analysis brought out. One of the "early adopters" was Isaac Newton, who nonetheless made a subtle and heretofore unnoticed error in applying the work.

• #### Prof. Yangyuan Ma

University of Neuchatel
November 23, 2007

Locally Efficient Estimators for Semiparametric Models With Measurement Error

#### Abstract

We derive constructive locally efficient estimators in semiparametric measurement error models. The setting is one where the likelihood function depends on variables measured with and without error, where the variables measured without error can be modelled nonparametrically. The algorithm is based on backfitting. We show that if one adopts a parametric model for the latent variable measured with error and if this model is correct, then the estimator is semiparametric efficient; if the latent variable model is misspecified, our methods lead to a consistent and asymptotically normal estimator. Our method further produces an estimator of the nonparametric function that achieves the standard bias and variance property. We extend the methodology to allow for parameters in the measurement error model to be estimated by additional data in the form of replicates or instrumental variables. The methods are illustrated via a simulation study and a data example, where the putative latent variable distribution is a shifted lognormal, but concerns about the effects of misspecification of this assumption and the linear assumption of another covariate demand a more model-robust approach. A special case of wide interest is the partially linear measurement error model. If one assumes that the model error and the measurement error are both normally distributed, then our estimator has a closed form. When a normal model for the unobservable variable is also posited, our estimator becomes consistent and asymptotically normally distributed for the general partially linear measurement error model, even without any of the normality assumptions under which the estimator is originally derived. We show that the method in fact reduces to a same estimator in Liang et al. (1999), thus showing a previously unknown optimality property of their method.

• #### Prof. Paul Emrechts

ETH Zurich
November 30, 2007

VaR-based Risk Management: Sense and (non-)Sensibility

#### Abstract

Quantitative Risk Management has as one of its aims the calculation/estimation of risk capital for banks and insurance companies. The standard method used is referred to as VaR, Value-at-Risk, and mathematically corresponds to a (typically high) quantile of a so-called P&L, Profit-and-Loss distribution. Over the recent years, we have witnessed several extreme events in financial markets (including the recent subprime crisis) for which VaR-based risk management did not really work. I will critically discuss this issue and point at directions of research in statistics which may be helpful for finding better models for so-called high-risk scenarios. The talk should be accessible to a more general audience.

• #### Prof. Marloes Maathuis

ETH Zurich
December 7, 2007

Computation of the MLE for Bivariate Interval Censored Data

#### Abstract

I will consider the nonparametric maximum likelihood estimator (MLE) for the bivariate distribution of (X,Y), when realizations of (X,Y) cannot be observed exactly, but are only known to lie in certain rectangular regions. Such data arise for example in HIV/AIDS studies. I will discuss the computation of the MLE for this type of data, and will illustrate the approach using the new R-package 'MLEcens'.

• #### Prof. James Carpenter

London School of Hygiene & Tropical Medicine
December 13, 2007

Multilevel Models with Multivariate Mixed Response Types

#### Abstract

We build upon the existing literature to propose a class of models for multivariate mixtures of normal, ordered or unordered categorical responses and non-normal continuous distributions, each of which can be defined at any level of a multilevel data hierarchy. We sketch a MCMC algorithm for fitting such models. We show how this unifies a number of disparate problems. The 2-level model is considered in detail, and applied to multiple imputation for missing data. We conclude with a discussion outlining possible extensions and connections in the literature. Beta-software, for Windows, for estimating a two-level version of the model is freely available from www.missingdata.org.uk under 'software'. Joint work with: Harvey Goldstein (Bristol University) and Mike Kenward (LSHTM).