EPFL Statistics Seminar

Statistics Seminars - Spring 2007

  • Prof. Angelika May

    University of Siegen
    March 12, 2007

    Copula Functions as a Tool for Modelling Dependent Data


    The talk will give an introduction into the concept of copula functions, focussing on the separate statistical treatment of the marginal distribution functions and the dependence concept. We discuss several measures for the strenght of dependence beteween data. In financial and actuarial applications, some data show asymmteric dependence in the tails. If this is the case, a transformed copula within the class of Archimedean copulas seem to be an appropriate choice. Despite of nice analytical properties that can easily be deduced, we will show that this approach causes problems for higher dimensions.

  • Prof. Elizabeth Smith

    University of Newcastle
    March 23, 2007

    A Max-Stable Process Model for Spatial Extremes


    The extremes of environmental processes are often of interest due to the damage that can be caused by extreme levels of the processes. These processes are often spatial in nature and modelling the extremes jointly at many locations can be important. A model which enables data from a number of locations to be modelled under a more flexible framework than in previous applications will be discussed. The model is applied to annual maximum rainfall data from five sites in South-West England. A pairwise likelihood is used for estimation and a Bayesian analysis is employed, allowing the incorporation of informative prior information.

  • Dr. Wolfgang Huber

    European Bioinformatics Institute/European Molecular Biology Laboratory
    April 4, 2007

    Automated Image Analysis for High-Throughput Cell-Based Microscopy Assays with R and Bioconductor


    Advances in automated microscopy have made it possible to conduct large scale cell-based assays with image-type phenotypic readouts. In such an assay, cells are grown in the wells of a microtitre plate or on a glass slide under a condition or stimulus of interest. Each well is treated with one of the reagents from the screening library and the response of the cells is monitored, for which in many cases certain proteins of interest are antibody-stained or labeled with a GFP-tag. The resulting data can be in the form of two-dimensional (2D) still images, 3D image stacks or image-based time courses. RNA interference (RNAi) libraries can be used to screen a set of genes (in many cases the whole genome) for the effect of their loss of function in a certain biological process. I will talk about some of the statistical and data analytic issues, and the tools in Bioconductor, in particular, the cellHTS, prada and EBImage packages.

  • Dr. Robert Gentleman

    Fred Hutchinson Cancer Research Center
    April 4, 2007

    Assessing the Role Played by Multi-Protein Complexes in Determining Phenotype


    While proteins are the primary mechanism by which cells carry out the various molecular processes needed for life, it is also true that proteins seldom act alone. Rather they often form multi-protein complexes that carry out particular functions. Using published, known multi-protein complexes, and pathways, in yeast we develop a number of statistical approaches to help elucidate the involvement of different levels of organization on observed changes in phenotype that arise from single gene manipulation experiments (deletion, mutation, up-regulation etc).

  • Prof. P. R. Parthasarathy

    IIT Madras
    June 1, 2007

    Stochastic Models of Carcinogenesis

  • Dr. Nadja Leith

    University College London
    June 8, 2007

    Addressing Uncertainty in Numerical Climate Models


    IIt is recognised that projections of future climate can differ widely between climate models and it is therefore necessary to account for climate model uncertainty in any risk assessment exercise. Here we suggest that a hierarchical statistical model, implemented in a Bayesian framework, provides a logically coherent and interpretable way to think about climate model uncertainty in general. The ideas will be illustrated by considering the generation of future daily rainfall sequences at a single location in the UK, based on the outputs of four different climate models under the SRES A2 emissions scenario.

  • Prof. P. R. Parthasarathy

    IIT Madras
    June 8, 2007

    Applied Birth and Death Models

  • Prof. P. R. Parthasarathy

    IIT Madras
    June 22, 2007

    Exact Transient Solution of State-Dependent Queues

Statistics Seminars - Autumn 2007

  • Prof. Jon Forster

    University of Southampton
    October 5, 2007

    Bayesian Inference for Multivariate Ordinal Data


    Methods for investigating the structure in contingency tables are typically based on determining appropriate log-linear models for the classifying variables. Where one or more of the variables is ordinal, such models do not take this property into account. In this talk, I describe how the multivariate probit model (Chib and Greenberg, 1998) can be adapted so that ordinal data models can be compared using Bayesian methods. By a suitable choice of parameterisation, the conditional posterior distributions are standard and are easily simulated from, and reversible jump Markov chain Monte Carlo computation can be used to estimate posterior model probabilities for undirected decomposable graphical models. The approach is illustrated with various examples.

  • Dr. Parthanil Roy

    ETH Zurich
    November 2, 2007

    Ergodic Theory, Abelian Groups, and Point Processes Associated with Stable Random Fields


    We consider the point process sequence $ \big\{\sum_{\|t\|_\infty \leq n} \delta_{b_n^{-1}X_t}:\, n \geq 1 \big\}$ induced by a stationary symmetric $\alpha$-stable $(0 < \alpha < 2)$ discrete parameter random field $\{X_t\}_{t \in \mathbb{Z}^d}$ for a suitable choice of scaling sequence $b_n \uparrow \infty$. It is easy to prove, following the arguments in the one-dimensional case in Resnick and Samorodnitsky (2004), that if the random field is generated by a dissipative $\mathbb{Z}^d$-action then $b_n=n^{d/\alpha}$ is appropriate and with this choice the above point process sequence converges weakly to a cluster Poisson process. For the conservative case, no general result is known even when $d=1$. In this talk, we look at a specific class of stable random fields generated by conservative actions for which the effective dimension $p \leq d$ can be computed using the structure theorem of finitely generated abelian groups and some basic counting techniques. For this class of random fields, in order to incorporate the clustering effect of extreme observations due to longer memory, we need to normalize the point process itself in addition to using a scaling sequence $b_n =n^{p/\alpha}$. The weak limit of this normalized point process happens to be a random measure but not a point process. A number of limit theorems for various functionals of the random field can be obtained by continuous mapping arguments from these weak convergence results. (This talk is based on a joint work with Gennady Samorodnitsky.)

  • Prof. Stephen Stigler

    University of Chicago
    November 20, 2007

    The 350th Anniversary of the Birth of Probability


    The first printed work in probability was published in 1657 by Christian Huygens. A part of that work is discussed and its connections with modern ideas of risk analysis brought out. One of the "early adopters" was Isaac Newton, who nonetheless made a subtle and heretofore unnoticed error in applying the work.

  • Prof. Yangyuan Ma

    University of Neuchatel
    November 23, 2007

    Locally Efficient Estimators for Semiparametric Models With Measurement Error


    We derive constructive locally efficient estimators in semiparametric measurement error models. The setting is one where the likelihood function depends on variables measured with and without error, where the variables measured without error can be modelled nonparametrically. The algorithm is based on backfitting. We show that if one adopts a parametric model for the latent variable measured with error and if this model is correct, then the estimator is semiparametric efficient; if the latent variable model is misspecified, our methods lead to a consistent and asymptotically normal estimator. Our method further produces an estimator of the nonparametric function that achieves the standard bias and variance property. We extend the methodology to allow for parameters in the measurement error model to be estimated by additional data in the form of replicates or instrumental variables. The methods are illustrated via a simulation study and a data example, where the putative latent variable distribution is a shifted lognormal, but concerns about the effects of misspecification of this assumption and the linear assumption of another covariate demand a more model-robust approach. A special case of wide interest is the partially linear measurement error model. If one assumes that the model error and the measurement error are both normally distributed, then our estimator has a closed form. When a normal model for the unobservable variable is also posited, our estimator becomes consistent and asymptotically normally distributed for the general partially linear measurement error model, even without any of the normality assumptions under which the estimator is originally derived. We show that the method in fact reduces to a same estimator in Liang et al. (1999), thus showing a previously unknown optimality property of their method.

  • Prof. Paul Emrechts

    ETH Zurich
    November 30, 2007

    VaR-based Risk Management: Sense and (non-)Sensibility


    Quantitative Risk Management has as one of its aims the calculation/estimation of risk capital for banks and insurance companies. The standard method used is referred to as VaR, Value-at-Risk, and mathematically corresponds to a (typically high) quantile of a so-called P&L, Profit-and-Loss distribution. Over the recent years, we have witnessed several extreme events in financial markets (including the recent subprime crisis) for which VaR-based risk management did not really work. I will critically discuss this issue and point at directions of research in statistics which may be helpful for finding better models for so-called high-risk scenarios. The talk should be accessible to a more general audience.

  • Prof. Marloes Maathuis

    ETH Zurich
    December 7, 2007

    Computation of the MLE for Bivariate Interval Censored Data


    I will consider the nonparametric maximum likelihood estimator (MLE) for the bivariate distribution of (X,Y), when realizations of (X,Y) cannot be observed exactly, but are only known to lie in certain rectangular regions. Such data arise for example in HIV/AIDS studies. I will discuss the computation of the MLE for this type of data, and will illustrate the approach using the new R-package 'MLEcens'.

  • Prof. James Carpenter

    London School of Hygiene & Tropical Medicine
    December 13, 2007

    Multilevel Models with Multivariate Mixed Response Types


    We build upon the existing literature to propose a class of models for multivariate mixtures of normal, ordered or unordered categorical responses and non-normal continuous distributions, each of which can be defined at any level of a multilevel data hierarchy. We sketch a MCMC algorithm for fitting such models. We show how this unifies a number of disparate problems. The 2-level model is considered in detail, and applied to multiple imputation for missing data. We conclude with a discussion outlining possible extensions and connections in the literature. Beta-software, for Windows, for estimating a two-level version of the model is freely available from www.missingdata.org.uk under 'software'. Joint work with: Harvey Goldstein (Bristol University) and Mike Kenward (LSHTM).