EPFL Statistics Seminar

Statistics Seminars - Spring 2008

  • Prof. Alexandra Dias

    University of Warwick
    February 19, 2008

    Semi-parametric Estimation of Portfolio Tail Probabilities


    In this talk we estimate the probability of occurrence of a large portfolio loss. This amounts to estimate the probability of an event in the far joint-tail of the portfolio loss distribution. These are rare extreme events and we use a semi-parametric procedure from extreme value theory to estimate its probability. We find that the univariate loss distribution is heavy tailed for three market indices and that there is dependence between large losses in the different indices. We estimate the probability of having a large loss in a portfolio composed by these indices and analyze the impact of the portfolio components weights on the probability of a large portfolio loss. With this procedure we are able to estimate the probability of portfolio losses never incurred before without having to assume a parametric dependence model. Furthermore, increasing the number of portfolio components does not bring complications to the estimation.

  • Prof. Alireza Nematollahi

    Shiraz University
    February 29, 2008

    Periodically Correlated Time Series and Their Spectra


    This talk is divided in two parts: Part I briefly introduces the spectral analysis problem, motivates the definition of power spectral density functions, and reviews some important and new techniques in nonparametric and parametric spectral estimation. We also consider the problem in the context of multivariate time series. In the second part, we consider periodically correlated (cyclostationary)time series, their spectra and also the estimation of these spectra, using the techniques introduced in Part I. We use the well known relation between the spectral density matrix of a periodically correlated time series and a stationary vector time series (Gladyshev, 1961). The results we derive here for multivariate time series are of general interest, and can be used in the estimation of stationary vector time series. These can also be used for the estimation of vector AR and ARMA models. The method of estimation is illustrated with simulated and real time series.

  • Mr. Thomas Fournier

    University of Fribourg
    March 10, 2008

    A Self-Regulated Gene Network


    In the last few years, the understanding of gene expression and regulation mechanisms has attracted a wide interest in the scientific community. A popular approach is to model the system as a time-continuous Markov process with values in a countable or finite space. In this talk, after briefly describing the historical background of the mathematical models of (bio-)chemical reactions and the main differences between deterministic and stochastic models, I will discuss mathematically a class of simple self-regulated genes which are the building blocks for many regulatory gene networks, and present a closed formula for the steady state distribution of the corresponding Markov process as well as an efficient numerical algorithm. This approach replace advantageously the time-consuming simulation using the Gillespie algorithm. Based on these results, I will present a realistic self-regulated network that works as a potent genetic switch, and show that this approach exhibits the main features observed experimentally.

  • Mr. David Kraus

    Charles University Prague
    March 14, 2008

    Data-Driven Smooth Tests in Survival Analysis


    The problem of comparison of two samples of possibly right-censored survival is considered. The aim is to develop tests capable of detection of a wide spectrum of alternatives which is useful when there is no clear advance idea of the departure from the null hypothesis. A new class of tests based on Neyman's embedding idea is proposed. The null hypothesis is tested against a model described by several smooth functions. A data-driven approach to the selection of these functions is studied, i.e., the test is performed against an alternative which is selected based on the observed data. These tests are constructed for two situations: first, we compare survival distributions in two samples, second, we compare two samples under competing risks (comparison of cumulative incidence functions). The small-sample performance is explored via simulations which show that the power of the proposed tests appears to be more stable than the power of some versatile tests previously proposed in the literature. Real data illustrations are given.

  • Prof. Lutz Duembgen

    University of Bern
    April 4, 2008

    P-Values for Computer-Intensive Classifiers


    In the first part I will review briefly traditional aproaches to (model-based) classification and discuss some conceptual problems. I will argue that a more convincing approach are certain p-values for class memberships. These enable us to quantify the uncertainty when classifying a single future observation. Classical results about (Bayes-) optimal classifiers carry over to this new paradigm. We claim that any reasonable classifier can be modified to yield non-parametric p-values for classification. Some simulated and real data sets will illustrate our approach. (This is joint work with Axel Munk (Goettingen) and Bernd-Wolfgang Igl (Luebeck).)

  • Prof. Werner Stahel

    ETH Zurich
    May 23, 2008

    Linear Mixing Models: Models, Estimation, and ``Target Testing''


    Monitoring stations collect data on a number $m$ of chemical compounds automatically in short intervals. We study several sets of one year of hourly data on 17 volatile organic compounds (VOC)Such data can be used to identify and quantify the contributions^of several sources of emission, even if they are unknown: Suppose that the pollution is generated by a small number $p SOME TEXT MISSING...

Statistics Seminars - Autumn 2008

  • Prof. Michael Wolf

    Universität Zürich
    October 3, 2008

    Formalized Data Snooping Based on Generalized Error Rates


    It is common in econometric applications that several hypothesis tests are carried out simultaneously. The problem then becomes how to decide which hypotheses to reject, accounting for the multitude of tests. The classical approach is to control the familywise error rate (FWE) which is the probability of one or more false rejections. But when the number of hypotheses under consideration is large, control of the FWE can become too demanding. As a result, the number of false hypotheses rejected may be small or even zero. This suggests replacing control of the FWE by a more liberal measure. To this end, we review a number of recent proposals from the statistical literature. We briefly discuss how these procedures apply to the general problem of model selection. A simulation study and two empirical applications illustrate the methods.

  • Prof. Anestis Antoniadis

    Université Joseph Fourier (Grenoble)
    October 10, 2008

    The Dantzig Selector in the Regression Model for Right-Censored Data


    The Dantzig Selector is an approach that has been proposed recently for performing variable selection in high-dimensional linear regression models with a large number of explanatory variables and a relatively small number of observations. As in the least absolute shrinkage and selection operator (LASSO), this approach sets certain regression coefficients to exactly zero, thus performing variable selection. However, such a framework, contrary to the LASSO, has never been used in regression models for survival data with censoring. A key motivation of this work is to study the variable selection problem for Cox's proportional hazards function regression models using a framework that extends the theory, the computational advantages and the optimal asymptotic rate properties of the Danzig selector to the much larger class of Cox's proportional hazards under appropriate sparsity scenarios.

  • Swiss Statistics Seminar

    October 24, 2008

    Event Programme


  • Prof. Wilfrid Kendall

    University of Warwick
    October 30, 2008

    Short-Length Routes in Low-Cost Networks via Poisson Line Patterns
    (Joint work with David Aldous)


    How efficiently can one move about in a network linking a configuration of n cities? Here the notion of "efficient" has to balance (a) total network length against (b) short network distances between cities: this is a problem in "frustrated optimization", linked to the notion of geometric spanners in computational geometry. Aldous and I have shown how to use Poisson line processes and methods from mathematical stereology to produce surprising networks which are nearly of shortest total length, and yet which make the average inter-city distance almost Euclidean. I will discuss this work and further developments: (a) describing actual geodesic paths, (b) exploring the distribution of flow statistics through a typical graph line segment.


    Aldous, D.J. & Kendall, W.S. (2008). Short-length routes in low-cost networks via Poisson line patterns. Adv. in Appl. Probab., 40 (1), 1-21.

  • Prof. John Haslett

    Trinity College Dublin
    November 7, 2008

    A Simple Monotone Process with Application to Radiocarbon-Dated Depth Chronologies


    We propose a new and simple continuous Markov monotone stochastic process and use it to make inference on a partially observed monotone stochastic process. The process is piecewise linear, based on additive independent gamma increments arriving in a Poisson fashion. An independent increments variation allows very simple conditional simulation of sample paths given known values of the process. We take advantage of a reparameterization involving the Tweedie distribution to provide efficient computation. The motivating problem is the establishment of a chronology for samples taken from lake sediment cores, i.e. the attribution of a set of dates to samples of the core given their depths, knowing that the ageÐdepth relationship is monotone. The chronological information arises from radiocarbon (14C) dating at a subset of depths. We use the process to model the stochastically varying rate of sedimentation.


    Haslett, J. & Parnell, A. (2008). A simple monotone process with application to radiocarbon-dated depth chronologies. J. Roy. Statist. Soc. C, 57 (4): 399 - 418.

    Haslett, J., Allen, J.R.M., Buck, C.E. & Huntley, B. (2008). A flexible approach to assessing synchroneity of past events using Bayesian reconstructions of sedimentation history. Quaternary Science Reviews 27 (19): 1872-1885. (link).

  • Prof. Rainer Dahlhaus

    Universität Heidelberg
    November 11, 2008

    Statistical inference for Locally Stationary Processes


    Locally stationary processes are models for nonstationary time series whose behaviour can locally be approximated by a stationary process. In this situation the classical characteristics of the process such as the covariance function at some lag k, the spectral density at some frequency lambda, or eg the parameter of an AR(p)-process are curves which change slowly over time. The theory of locally stationary processes allows for a rigorous asymptotic treatment of various inference problems for such processes. We present different estimation and testing results for locally stationary processes. In particular we discuss nonparametric maximum likelihood estimation under shape restrictions. Empirical process theory for the theoretical treatment of such problems plays a major role. We define an empirical spectral process indexed by a function class and use this process to derive various results on estimation and testing. As a technical tool we derive an exponential inequality and a functional central limit theorem for the empirical spectral process.


    Dahlhaus, R. (2000). A likelihood approximation for locally stationary processes. Ann. Statist. 28, 1762-1794.

    Dahlhaus, R. and Polonik, W. (2006). Nonparametric quasi maximum likelihood estimation for Gaussian locally stationary processes. Ann. Statist. 34, No. 6, 2790-2824.

    Dahlhaus, R. and Polonik, W. (2008). Empirical spectral processes for locally stationary time series. Bernoulli, to appear.

  • Prof. Nanny Wermuth

    Chalmers University of Technology & Göteborg University
    President of the Institute of Mathematical Statistics (IMS)
    November 21, 2008

    Consequences of Research Hypotheses Captured by Special Types of Independence Graph


    A joint density of several variables may satisfy a possibly large set of independence statements, called its independence structure. Often this structure is fully representable by a graph that consists of nodes representing variables and of edges that couple node pairs. We consider joint densities of this type, generated by a stepwise process in which all variables and dependences of interest are included. Otherwise, there are no constraints on the type of variables or on the form of the distribution generated. For densities that then result after marginalising and conditioning, we derive what we name the summary graph. It is seen to capture precisely the independence structure implied by the generating process, it identifies dependences which remain undistorted due to direct or indirect confounding and it alerts to possibly severe distortions of these two types in other parametrizations. We use operators for matrix representations of graphs to derive matrix results and translate these into to special types of path.

  • Prof. Thomas Scheike

    University of Copenhagen
    December 4, 2008

    Estimating Haplotype Effects for Survival Data


    We here describe how simple estimating equations can be used for Cox's regression model in the context of assessing haplotype effects for surival data. The estimating equations are simple to implement and avoids the use of the EM algorithm that in the context of the semiparametric Cox model may be slow. The estimating equations also lead to direct estimators of standard errors that are easy to compute, and thus overcome some of the difficulty with obtaining variance estimators b ased on the EM algorithm in this setting. We also develop a useful and simple to implement goodness-of-fit procedures for Cox's regression model in the context of haplotype models. Finally, we use the developed procedures for data that investigate the possible haplotype effects of the PAF-receptor on cardiovascular events in patients with coronary artery disease and compare our results to those based on the EM-algorithm.