Projects

2024

Student Thesis Title Supervisors Industry Partner Download
Denis Schaub Semester Random Durations in Insurance Pricing Mario Wüthrich
Abstract: This semester project follows the ideas of Lindskog-Lindholm-Palmquist (Scandinavian Actuarial Journal 2023). It considers the problem of non-life insurance contracts with the issue of possible early termination. This is tackled with a change of measure. Additionally, it proposes a distribution free locally-unbiased predictor on the basis of a previously chosen (potentially biased) predictor. It offers insight into partitioning methods of the covariate space and how to handle the variable duration problem. This leads to an automatic, data-driven tariffing method, where the number and size of the tariffs correspond to the partitioning. This procedure is illustrated both with simulated data and with data from the freMTPL2 data set.

2023

Student Thesis Title Supervisors Industry Partner Download
Luca Aschmann MSc Meta-Labeling Architectures for Return Classification Patrick Cheridito
Jacques Joubert
Abu Dhabi Investment Authority
Abstract: Lopez de Prado (2018) introduced (Single) Meta-Labeling, an approach consisting of training a secondary model how to use a primary exogenous model in order to extract predicted probabilities for dynamic position sizing, which still remains widely unex- amined in literature. We introduce two novel architectures, Relative and Dual Meta- Labeling, which, in addition, also allow for a dynamic switching mechanism between two primary models. The goal of this thesis is to formalize the architectures and examine whether Meta-Labeling can increase the performance of an exogenous primary model. We utilize slow and fast time-series momentum strategies as primary models as well as Random Forests, XGBoost and LightGBM as secondary models and backtest our exper- iment on monthly S&P 500 data. We find that Relative Meta-Labeling outperforms all baselines, mainly uses volatility-based features for adjusting its switching dynamics and allows for valuable inference about market regimes. The Single Meta-Labeling archi- tecture alike outperforms all primary models, yet suffers from long subsequent periods of inactivity. Dual Meta-Labeling achieves the highest risk-adjusted returns and aids time-series momentum strategies in responding to momentum turning points.
Nico Ehrler MSc Individual Claims Reserving with Machine Learning Methods Mario Wüthrich
Frank Ettwein
Baloise
Abstract: For all accidents happening during the insurance period of an insurance contract, the insurance company needs to cover the corresponding losses. Therefore, the cash flows for these future losses have to be determined such that an insurance company can determine the size of its liabilities. Usually, aggregated methods, such as modifications of the Chain-Ladder algorithm, are used to determine the size of the liabilities. This thesis tries to determine the size of the liabilities through predictions on each individual claim instead of on an aggregated level. These predictions are made with gradient boosting machines, namely the LightGBM package in R is used. Then, the aggregated Chain-Ladder algorithm is compared with the individual predictions on a claim as well as on a portfolio basis.
Egemen Erdogdu MSc Analysis of the Distribution of Corporate Defaults with Bayesian Methods Patrick Cheridito
Kai Schnee
Gabriel Visentin
Abstract: This thesis proposes Bayesian approaches for model validation of three different credit risk models: Gaussian one-factor model, a reduced-form model, and a Restricted Boltzmann Machine (RBM) credit risk model. The primary objective of this thesis is to evaluate the calibration of these models using Bayesian techniques, which provide an intuitive and visual way for model validation. Proposed methodologies are implemented on a real default data set of US speculative grade borrowers.
Tatjana Mäder Semester Variable Annuity Hedging Patrick Cheridito
Abstract:
Tatjana Mäder MSc A Reinsurance Pricing Model for Nuclear Pools Mario Wüthrich
Philipp Arbenz
SCOR
Abstract: The current discussion about how and with which fuels more electricity can best be generated also concerns the use of nuclear energy. Despite past negative headlines, the nuclear power sector continues to grow and as a result it is becoming increasingly interesting for reinsurers to write business of nuclear risks. In this context, reinsurance companies participate in nuclear pools, as the loss of a nuclear accident exceeds the capacity of a single insurer. This thesis deals with the pricing of such a pool. The goal is to derive a model that can quantify the expected total loss amount in the event of a nuclear accident. For the derivation of the model, data on nuclear accidents as well as characteristics of all worldwide operating reactors have been collected. Based on this, an attritional frequency-severity model was developed. The main focus hereby lied on estimating the frequency of a large loss nuclear accident by means of influencing factors. In order to incorporate such key drivers, a generalized linear model was used. As a final model, a Poisson log-linear model was chosen, which depends on the exposure, i.e., the global number of operating reactors, as well a time factor resulting from the number of calendar years since the first nuclear reactor was commissioned. In a further analysis, the historical loss sizes were examined and a severity distribution was fitted. Due to several past extreme losses, the focus lied on heavy-tailed distributions. It was found that the use of a log-gamma distribution provides the best fit, resulting in an infinite mean model. The final result is a pricing model for nuclear pools, which models losses resulting from nuclear accidents using a compound Poisson-log-gamma model for the large losses.
Mohamed Fadhel Omar MSc Optimising the Allocation of Time Supplements in Railways Timetables: A Heuristic Approach Dan Burkolter
Burkhard Franke
Francesco Corman
Patrick Cheridito
trafIT solutions GmbH
Abstract:
Christopher Panizzolo Semester A Combined Approach of Multidimensional Lee-Carter Model and Hidden Markov Model Mario Wüthrich
Hélène Schernberg
Abstract: This semester paper describes a multi-dimensional extension of the Lee-Carter mortality model whereby the mortality indices are driven by a hidden Markov process. This allows capturing transitions between mortality regimes characterized by different trends and volatilities. We build on the paper “Multidimensional Lee–Carter model with switching mortality processes” by Hainaut (2012) which describes a two-dimensional Lee-Carter model driven by a two-states Markov process. We extend this paper in two ways. First, we allow for as many dimensions of the temporal indices and for as many states of the Markov process as desired by the user (e.g., based on predictive power). Second, we rely on an improved method for identifying the Markov chain and calibrating the model, namely, by using the Baum-Welch algorithm instead of a method based on the Hamilton filter.
Panagiotis Papakonstantinou MSc Hedging Options with Deep Learning Patrick Cheridito
Stephan Eckstein
Abstract: The incredible speed improvement in computers has been a defining feature of the 21st century. This progress has allowed us to use complex math methods to solve tough problems. One area that has become really popular in the last decade is called deep learning. Simply put, deep learning is a way to teach computers to do math tasks without telling them exactly how to do it. They learn by trying things out, just like humans do through trial and error. This is made possible by neural networks, which are like our brain’s digital version. These networks have interconnected parts, like brain cells, that respond when given data. The goal of training them is to make them good at responding correctly when they see lots of examples, so they can make accurate conclusions or predictions.
This thesis focuses on the use of deep learning in the field of financial modeling, with a specific emphasis on pricing and risk managementof financial securities. The particular security type under analysis is called an option, which is a financial contract linked to an underlying asset like a stock or bond. Options grant the buyer the choice to buy or sell the asset at a predetermined price and time. They are known as financial derivatives since their value is derived from the underlying asset. The task of determining a fair price for this derivative is closely connected to reducing its risk through hedging.
The primary aim of this thesis is to reduce the risk of a portfolio containing an option by employing hedging with a neural network. Theobjective is to develop an agent that can automate the tasks performed by a human trader using quantitative data, following the principles of machine learning. Initially, we apply this approach in a straightforward setting, such as the Black-Scholes model, and subsequently, we evaluate its performance in a more realistic scenario, opting for the Heston model.
The results of our study reveal that the deep hedging approach, when applied to simulations based on the Black-Scholes model, generates a hedging strategy that is comparable and slightly superior to the theoretical optimal hedging strategy for Black-Scholes. Furthermore, in the Heston model, the resulting hedging strategy exhibits performance similar to the traditional hedging method derived from the analytical pricing model.
Adrien Perroud Semester Prediction Intervals with Generalized Linear Models Mario Wüthrich
Abstract: Prediction intervals estimate future value ranges based on observed values with a certain probability. In this paper, we present different methods to construct such intervals. The first part focuses on presenting the algorithms. In the second part, we apply the prediction intervals to simulated data with the help of generalized linear models. Finally, we analyse their performance and computational burden. The first two methods are based on a popular statistical tool, bootstrapping. We proceed by resampling the original data many times and fit a GLM model to each resample. The resulting regressors are then used to construct the prediction interval. Another way for constructing these intervals is by conformal inference. This relatively new concept is based on testing whether a set of candidate values can be accepted in the prediction interval. More precisely, we check how conformal a candidate is using a specified distance function. The resulting distance, called conformity score, is then compared to a set of distances for some data and if the new distance is enough small, the candidate is conformal. The goal is to find such sets of distances to compare our candidate values with. We present 4 different methods to construct such sets. After presenting the methods, we test the prediction intervals on a set of simulated data. We consider a dataset of Swedish motorcycle claims and resample with replacement the data. We fit a GLM to the resampled data and find the average point prediction for every data point. The response variables are then generated by a gamma distribution with the average point prediction as the scale parameter and shape parameter equal to 1. The intervals are computed on this new set and analysed.
Florian Rossmannek PhD The Curse of Dimensionality and Gradient-based Training of Neural Networks: Shrinking the Gap between Theory and Applications Patrick Cheridito
Arnulf Jentzen
DOI
Abstract: Neural networks have gained widespread attention due to their remarkable performance in various applications. Two aspects are particular striking: on the one hand, neural networks seem to enjoy superior approximation capacities than classical methods. On the other hand, neural networks are trained successfully with gradient-based algorithms despite the training task being a highly nonconvex optimization problem. This thesis advances the theory behind these two phenomena.
On the aspect of approximation, we develop a framework for showing that neural networks can break the so-called curse of dimensionality in different high-dimensional approximation problems, meaning that the complexity of the neural networks involved scales at most polynomially in the dimension. Our approach is based on the notion of a catalog network, which is a generalization of a feed-forward neural network in which the nonlinear activation functions can vary from layer to layer as long as they are chosen from a predefined catalog of functions. As such, catalog networks constitute a rich family of continuous functions. We show that, under appropriate conditions on the catalog, these catalog networks can efficiently be approximated with rectified linear unit (ReLU)-type networks and provide precise estimates of the number of parameters needed for a given approximation accuracy. As special cases of the general results, we obtain different classes of functions that can be approximated with ReLU networks without the curse of dimensionality.
On the aspect of optimization, we investigate the interplay between neural networks and gradient-based training algorithms by studying the loss surface. On the one hand, we discover an obstruction to successful learning due to an unfortunate interplay between the architecture of the network and the initialization of the algorithm. More precisely, we demonstrate that stochastic gradient descent fails to converge for ReLU networks if their depth is much larger than their width and the number of random initializations does not increase to infinity fast enough. On the other hand, we establish positive results by conducting a landscape analysis and applying dynamical systems theory. These positive results deal with the landscape of the true loss of neural networks with one hidden layer and ReLU, leaky ReLU, or quadratic activation. In all three cases, we provide a complete classification of the critical points in the case where the target function is affine and one-dimensional. Next, we prove a new variant of a dynamical systems result, a center-stable manifold theorem, in which we relax some of the regularity requirements usually imposed. We verify that ReLU networks with one hidden layer fit into the new framework. Building on our classification of critical points, we deduce that gradient descent avoids most saddle points. We proceed to prove convergence to global minima if the initialization is sufficiently good, which is expressed by an explicit threshold on the limiting loss.
Matthias Schmickler MSc Industrial Property Insurance Rate Making Mario Wüthrich
Adrian Kolly
Adrian Lüssy
Swiss Re
Abstract: Nowadays, data analytics is common practice in pricing of non-life insurance products. In the reinsurance industry, however, it is not easy to use a fully data-driven approach to pricing. Therefore, many prices are made of a combination of data and expert judgement. This is mainly due to the heterogeneity of data and the fact that little data from direct insurers ends up in reinsurance. In this thesis, we look at what a data-driven approach might look like. The first part is devoted to the theory of pricing contracts in the reinsurance industry. In the second part, we apply the methods to a dataset provided by Swiss Re. The results shows that the approach is promising but it highly depends on the data quality.
Trevor Winstral MSc Network Statistics and Systemic Risk in Financial Networks Stefano Battiston
Patrick Cheridito
Abstract: The subsequent work begins with a literature review of modern research in the field of systemic risk in financial networks. This covers the statistical methods used in the evaluation of the topologies of empirical financial networks, the framework used to model financial contagion (NEVA), and the results found from various specification of NEVA applied to theoretical and empirical financial systems. Next, novel Bayesian and frequentist statistical methodologies are proposed for improved evaluation of empirical network topologies. These methods allow for understanding the degree to which empirical networks fit idealized Core-Periphery networks. Finally, a method to introduce firesales, illiquidity, and leverage requirements is introduced to the NEVA framework, adding thus far unaccounted for parameters to NEVA. The dependence of the size of financial crash cascades stemming from the aforementioned parameters, as well as the deviance of the topology from an ideal Core-Periphery network, is then studied.
Haoyun Ying Semester An Analysis of the Electricity Infrastructure in South Africa Mario Wüthrich
Suchita Srinivasan
Abstract: This project proposes a mathematical optimization approach to determine the number and locations of power plants in South Africa. The objective is to minimize the total operation and transmission costs while meeting the growing demand for electricity. We formulate the problem as an optimization model, with fixed activation and variable transportation costs. We investigate two clustering methods: k-medoid clustering based on sum of distance and computationally easier k-means clustering based on the sum of squared distances. We determine the optimal number of power plants comparing total costs and validate results from k-means clustering through silhouette scores. We present the numerical results of our approach and compare them to the current power plant locations in South Africa. Our approach can provide decision makers with valuable insights for optimizing the country’s electricity grid system.
Philipp Zimmermann PhD Inverse Problems for Variable Coefficient Nonlocal Operators Patrick Cheridito
Joaquim Serra
DOI
Abstract: The purpose of this PhD thesis is the study of the class of nonlocal inverse problems, which can be regarded as nonlocal generalizations of the famous Calderón problem. This thesis is divided into two parts. In Part 1, we consider linear nonlocal inverse problems and in Part 2 nonlinear nonlocal inverse problems.

2022

Student Thesis Title Supervisors Industry Partner Download
Davide Apolloni MSc Significance Tests for Neural Networks Mario Wüthrich
Abstract: We present a pivotal test to assess the statistical significance of feature variables in a single-layer sigmoid neural network. We provide an asymptotic estimator of the test statistics and derive its asymptotic distribution under the corresponding null hypothesis. This test allows us for variable selection within single-layer sigmoid neural networks. The key tool to prove such results are sieve estimators, where the complexity of the estimator increases with the size of the data. Universality theorems say that this allows us to precisely reconstruct the true regression function, which then is the basis for the variable significance test.
Rayen Ayari BSc Mortgage Default Prediction Patrick Cheridito
Abstract: Mortgage probability of default is a key factor in Credit Portfolio risk management which quantifies the likelihood of a client not paying back their debt and interest. To my knowledge, prior literature has relied either on mathematical models or on machine learning models to quantify this key concept without implementing a framework explaining the impact of every variable (ML models feature impact). Credit risk assessment models have been mostly studied using machine learning techniques but are a bit limited with regard to deep learning. In this paper, we implement state-of-the art ensemble machine learning models that include transition matrices in computing PD (probability of default). Specifically, we focus on implementing a credit scoring model composed of an ensemble of convolutional neural network and conditional transition matrices applied to more than 7.2 million loans and 88.6 million monthly records offered by the US-based mortgage loan company Freddie Mac. We discuss the results of this model compared to random Forests and Gradient Boosted Trees (the two most used ML models by banks and rating agencies). We also discuss the explainability of these three models using SHAP values to get a global explanation of the impact of every feature on the PD.
Berno Binkert MSc Optimal Liquidity Pool Graphs Patrick Cheridito
Roger Wattenhofer
Abstract:
Richard Breitschopf MSc Deep Reinforcement Learning for Optimal Trade Execution Patrick Cheridito
Moritz Weiss
Abstract:
Niels Cariou-Kotlarek Semester Hawkes Iterative Time-dependent Estimation of Parameters Patrick Cheridito
Abstract:
Niels Cariou-Kotlarek MSc Jump-Diffusion Models for Financial Bubbles Modelling: A Multi-scale Type-II Bubble Model with Self-Excited Crashes Patrick Cheridito
Didier Sornette
Roger Wehrli
Abstract:
Federica Casanova BSc Mortality Modeling using Frailty Methods Mario Wüthrich
Abstract: Standard methods for calculating mortality rates consider a homogeneous population, which tends to overestimate mortality. This thesis aims to improve mortality rate estimation, so that future mortality can also be better predicted. To this end, we add a frailty parameter, which leads to considering a heterogeneous population, thus not ignoring differences in longevity between individuals. We will consider two different baseline mortalities and different distributions for the frailty variable, comparing estimated and forecasted mortalities with and without frailty, seeing which one better predicts the actual data. This way, we can have a more accurate estimation and forecast, especially for the elderly.
Walid Chatt Semester Machine Learning for Fraud Detection: An Initial Approach to Tackle the Issues of Class Imbalance and the Shortage of Labels Helmut Bölcskei
Patrick Cheridito
Silvia Mongellu
UBS
Abstract:
Chantal Emmenegger MSc Elicitability and Consistency in Statistical Estimation Mario Wüthrich
Abstract: Elicitability and consistency are central concepts in semiparametric statistical estimation, specifically when considering M-estimation through loss functions, with its corresponding counterpart for Z-estimation and identification functions being called identifiability. For estimation purposes one relates these terms to consistent estimation in order to avoid a bias. In the multivariate setting we make note of Osband’s principle, associating M- with Z-estimators and vice versa under certain conditions. Since these restrictive conditions have to be met, we receive a gap between loss and identification functions, as there are far more identification functions than loss functions, which eventually might lead through efficient estimation to an efficiency gap. To illustrate this gap we show the double quantile model in a theoretical and a numerical setting and complement it by the pair of first two moments and the mean, variance pair, since no gap arises in this setting.
Selim Gatti MSc Optimal Insurance Policies under Ambiguity Aversion Patrick Cheridito
Abstract:
Arianna Guadagnini BSc Evaluating the Tail Risk of Multivariate Aggregate Losses Mario Wüthrich
Abstract: This thesis examines tail risk measures for several widely used multivariate aggregate loss models where the claim counts are dependent while the claim sizes are mutually independent and independent of the claim counts. First, we derive formulas for moment transforms of the multivariate aggregate losses, showing how they relate to moment transforms of the claim counts and claim sizes. Using these formulas, we evaluate popular risk measures, such as the multivariate tail conditional expectation (MTCE) and the multivariate tail covariance (MTCov) of aggregated losses. Moreover, we determine capital allocations.
André Emanuel Jacob BSc Analysis of Peer-to-peer Insurance Mario Wüthrich
Abstract: We consider a peer-to-peer (P2P) insurance scheme where the higher layer of the total risk is covered by a (re-)insurer and the global retention level grows proportionally with respect to the total number of participants. The retained losses are then distributed among the participants according to the conditional mean risk sharing rule. The individual retention levels are analyzed as the number of participants increases. The results depend on the proportional rate of increase of the global retention level, as well as on the existence of the Esscher transform of the individual losses brought into the pool.
Yan-Xing Lan MSc Multi-Dimensional Exponential Family for Claim Size Modeling Mario Wüthrich
Simon Rentzman
AXA Winterthur
Abstract: Generalized linear models (GLMs) have been and still are important tools for statistical modeling. They are a cornerstone in the pricing of insurance contracts. However, the modeling of the dispersion parameter in the framework of GLMs is often neglected and only approximated outside of the modeling framework. Therefore, we investigate double generalized linear models (DGLMs) and also the direct approach of using the maximum likelihood estimator for the parameters of the exponential family (EF), which both also model and estimate the dispersion parameter. In the first part of this thesis we describe the theory behind DGLMs and parameter estimation in the EF, where we establish these two approaches; in the case of the gamma and inverse Gaussian distributions actually they give the same results. In the second part we model claim sizes of insurance data using gamma and inverse Gaussian distributions. We conclude that, depending on the data, DGLMs can improve the modeling.
Yining Li Semester The Application of Quantum Computing in Financial Portfolio Optimization Problems Patrick Cheridito
Stephan Eckstein
Abstract:
Andrin Melliger BSc Modeling Mortality with the Lee-Carter Model Mario Wüthrich
Michael Koller
Amlin Re
Abstract: As demographic data reveals, mortality rates vary between different subgroups of the human population. Moreover, they are by no means constant over time but rather they have generally declined in the past. Such changes can have financially material consequences for the life insurance industry, which necessitates reliable models to forecast mortality trends. In a first part of this bachelor thesis, one potential model - the Lee-Carter model - is thoroughly explained and its performance is analyzed based on Swiss mortality data. The backtesting procedure conducted reveals that the Lee-Carter approach is only suitable to some limited extent for modeling this data from Switzerland. The second part of this work focuses on examining the conceptual model risk entailed in the Lee-Carter modeling approach by critically appraising its underlying assumptions. While it is put forward that the general modeling assumptions which the Lee-Carter model is based on seem to be more or less plausible, it is also stressed that assessing and, in particular, quantifying the conceptual model risk is an extremely challenging endeavor. Moreover, possible ways of extending and improving the model are discussed, one of which consists in introducing a capability to account for cohort effects.
Rebecca Morger MSc Imputation Algorithms with Principal Component Analysis for Financial Data Patrick Cheridito
Pawel Kuczera
Philipp Arbenz
SCOR
Abstract: The goal of this thesis is the imputation of missing values in the financial data set of SCOR. Among others, the data consist of equity indices and yield curves. There are several reasons why financial data contain missing values: A currency might not have existed yet, counterparties in a certain rating category were not available, and many others. However, for reinsurance modeling purposes, the data set needs to be complete.
The method used for the imputation is Principal Component Analysis (PCA), which takes into account the correlation structure of the data variables. The extension of PCA to the case of missing data yields a non-convex optimization problem. We focus on the “Iterative PCA Algorithm” as well as gradient-based “Subspace Learning Algorithms”. These algorithms are compared and used to impute the returns of the financial variables. The transformation from the level of returns to the level of the original data is constructed with Brownian bridges.
Bianca Morrone MSc Dynamic Classification Patrick Cheridito
Gregor Heyne
UBS
Abstract: The problem of dynamic classification for temporal sequences is to construct a classifier which makes incremental decisions that are sensitive to the changes in a temporal environment, and such that the classifier achieves a reliable classification at an effective point in time. In this work, we introduce a dynamic classification framework which requires a dynamic model, that can perform incremental evaluations of a sequence, along with a time-sensitive environment to make reliable classification decisions in. We implement a few of the proposed methods to demonstrate how dynamic classification can be applied to fields which can benefit from a more time-sensitive approach, such as anomaly detection. We then compare these methods to conventional full sequence classifiers and observe the inherent tradeoff between timeliness and accuracy.
Fabian Rohner Semester Estimating the Discount Curve Patrick Cheridito
Abstract:
Matthias Schmickler BSc Mathematics of Stochastic Loss Reserving Patrick Cheridito
Abstract:
Lena Schütte MSc A Churn Model for Swiss Mandatory Health Insurance Patrick Cheridito
Azenes
Abstract: In this thesis, we investigate the use of churn models in an actuarial pricing context. We fit logistic regression, classification tree and gradient boosting machine models to a large data set of a swiss health insurer. Here, the actuarial premiums are implicitly based on an assumed portfolio structure, which is predicted by the churn model. We therefore develop a pricing loss function that measures the impact of the churn prediction error on the predicted profits and can be seen as a proxy for the error of the actuarial premium resulting from the error in the churn model. Each model’s performance is then compared with respect to the Pricing Loss, the Binomial Deviance and the AUC. As pricing is linked to setting a market premium, we aim to incorporate the impact of the insurer’s premium in a competitive market in the churn model. To do this, we include the insurer’s premium, premium changes and the premiums of the main competitors as explanatory variables. For logistic regression and gradient boosting machine, we then deduce an approximation of the premium sensitivity of the insured.
Lena Schütte Semester Modeling Churns with ANNs for Actuarial Pricing Patrick Cheridito
Abstract:
Trevor Winstral Semester Review of Systemic Risk in Financial Networks Stefano Battiston
Patrick Cheridito
Abstract:
Jianing Yang Semester Statistical Analysis of Telematics Car Driving Data Mario Wüthrich
Abstract: In this semester paper we analyze a synthetic dataset of 100’000 car insurance policies generated by So, Boucher and Valdez (2021), which contains observations from classical risk features as well as telematics-related features. Regression against claim frequency is carried out using various machine learning methods, including generalized linear models, generalized additive models, regression trees and Poisson tree boosting. We compare the predictive power of each regression estimator, and through our analysis we acquire an insight on the most important features that affect claim frequency.

2021

Student Thesis Title Supervisors Industry Partner Download
Mayeul Cayette Semester Computational General Equilibrium Model of Optimal Carbon Policies Mario Wüthrich
Alexey Minabutdinov
Abstract: We derive the optimal contribution to the global climate policy for a given country having a fixed emission limit. Therefore, we study the Ramsey model that describes the transition and consumption of a given pollution capital budget, and we are determining the optimal path reaching this limiting budget. These considerations are done under a finite time horizon view and an infinite time horizon view, and for the infinite time horizon problem we derive the steady states of the model giving an asymptotic balanced growth rate. Numerically, these problems are solved by a value function iteration and a backward iteration. We derive these algorithms and we calibrate the models under different utility functions to the real world problem to derive short-run and long-run consumption of capital and pollution budgets.
Eneko Clemente Semester MisGAN: A Machine Learning Approach to Learn from Incomplete Data Mario Wüthrich
Abstract: Often one has to deal with incomplete datasets. This paper aims at studying an imputing approach that essentially consists of completing the missing values of the data. We discuss the MisGAN Imputer, which uses Wasserstein Generative Adversarial Networks to perform this task. While this approach is built on a strong theoretical foundation, we will show that its application in practice is difficult and does not fully comply with the theoretical promises.
Hadi Eghlidi DAS Machine Learning and Deep Learning for Business Sales Forecasting Patrick Cheridito
VAT Group
Abstract: In recent years, machine learning algorithms and methods have been increasingly employed in different aspects of manufacturing and business-to-business (B2B) industry. One of the areas of interest is leveraging data to understand the dynamics and drivers of such complicated industries and forecasting the future business volume and performance. This helps in different aspects of the industry such as managing manufacturing capacity and supply chain, planning investment in technology transitions and production planning and control. In this project, we employ data science and machine learning approaches to forecast the future sales of a B2B company and understand its relation with a supply chain indicator. The workflow includes a literature review, communicating with the company’s business managers to determine the relevant drivers, collecting and preparing the company’s sales data and an important supply chain driver, training, validating various machine learning methods, and visualizing the results and draw business conclusions. As for the machine learning techniques, we use a multi-regressor technique developed by Facebook (Prophet), and a few popular deep learning techniques used for time-series forecasting.
Daisuke Frei MSc Insurance Claim Size Modelling with Mixture Distributions Mario Wüthrich
Simon Rentzmann
AXA Winterthur
Abstract: Insurance claim size data often cannot be modeled precisely by a single (one- or two-parameter) probability distribution, because the small and large claims can have a very different behavior. In this thesis we use mixture distributions to model insurance claim size data. We consider mixtures of light tailed and heavy tailed distributions in order to obtain an accurate approximation of both small and large claims. In the theoretical part of this thesis, we introduce the distributions which we will consider and the methods that we will use to fit them to the data. In particular, to fit mixture distributions we use the Expectation-Maximization algorithm. We first study homogeneous models and then we improve them by making use of the generalized linear model framework. In the application part of my thesis we compare the fit of different mixture distributions to a sample of insurance claim size data, and we conclude that the proposed framework can accurately capture the features of the claims.
Selim Gatti Semester Pareto Optimal Insurance Policies Patrick Cheridito
Abstract: Uncertainty affects every activity around the world. Methods were thus developed to pool the risks between individuals leading to insurance. Nowadays an insured person buys a contract from an insurer which guarantees a coverage if a certain type of loss occurs. Many forms of contracts exist like policies with a deductible or policies with full insurance up to an upper limit and coinsurance above it. In this paper, we formulate mathematically an insurance problem using expected utility theory, we then solve it and give the conditions under which these forms can be seen as optimal.
Nick Gebert BSc Detection of Asset Price Bubbles Patrick Cheridito
Abstract: This work gives an introduction to the bubble process of an asset in an arbitrage-free complete market case using martingale theory. Building on this, a detection criterion for an asset price bubble is presented and implemented using a neural network. Lastly, the network is employed to detect asset price bubbles in real-world assets.
Massimo Michele Jörin MSc Hedging Options with Reinforcement Learing Mario Wüthrich
Abstract: This thesis shows how different reinforcement learning algorithms can be implemented to calculate trading strategies for hedging problems within the Cox-Ross-Rubinstein and the Black-Scholes-Merton models. While the Cox-Ross-Rubinstein and the Black-Scholes-Merton models allow us to explicitly hedge and price options in finite discrete and finite continuous time, respectively, the assumptions made for these models are rather restrictive (e.g., w.r.t. transaction costs, impossibility of holding arbitrary amounts of stock assets, bid-offer spread, and negative interest rates). Reinforcement learning provides us with a way of finding optimal (or close to optimal) solutions to the hedging and pricing problem of options where there is no closed form solution available. We present the model-based algorithms Value Iteration and Policy Iteration, which require complete knowledge about the stochastic model. Furthermore, we also present model-free algorithms, which do not require complete knowledge about the model and compensate this by interacting with the environment / model and processing these observations. In the case of a discrete state space and a discrete action space we present the algorithms SARSA, Q-Learning, and Double QLearning for finding such solutions. For a continuous state space and a discrete action space we use Deep Q Network (DQN) and Deep Double Q Network (DDQN).
Fang Rui Lim MSc Entropy Martingale Optimal Transport and Utility Induced Divergences Patrick Cheridito
Abstract: The objective of this thesis is to investigate the dual of the Entropy Martingale Optimal Transport problem, introduced by Doldi et al. “Entropy Martingale Optimal Transport and Nonlinear Pricing-Hedging Duality" for non-compact subsets of R^n. We provide conditions under which the duality representation, a type of inf-sup relation, holds and when this infimum is achieved. As an application, we concentrate on the case when the Entropy Martingale Optimal Transport problem involves the control of the marginal distribution of measures via divergence terms induced by utility functions as in Doldi et al.
Nikolaos Mourdoukoutas MSc Probabilistic Approaches to Invariance Patrick Cheridito
Gunnar Rätsch
Abstract: We propose three novel Bayesian models that can learn invariances from data alone by inferring a posterior distribution over different weight-sharing schemes. We show that our last method, which is a Bayesian neural network, outperforms other noninvariant architectures when trained on datasets that contain specific invariances. The same holds true when no data augmentation is performed. Finally, we overview some already existing approaches for modeling invariant functions with Gaussian processes.
Simon Müller MSc On the Transformation of Actuarial Loss Models into Synthetic NatCat Loss Tables Philipp Arbenz
Patrick Cheridito
SCOR
Abstract: In (re-)insurance, when it comes to loss modelling and aggregation, the actuarial and natural catastrophe modelling approaches are rather separate.
- On the actuarial side, loss modelling through aggregate or frequency-severity distributions is often used and aggregated through dependence assumptions such as copulas or correlation matrices.
- On the natural catastrophe side, loss simulations through NatCat modells (using hazard, exposure and vulnerability components) are used and aggregated by adding up loss amounts by event.

The thesis brings these two worlds closer by bridging the gap for actuarial modells to be aggregated in NatCat aggregation systems. Such systems need a consistent set of so called “Event IDs” to aggregate losses across different contracts or portfolios since for each modelled contract the loss amounts are linked to these event IDs. SCOR has built an algorithm which allows to translate standard actuarial frequency-severity modells into a modell setup allowing to attach such event IDs to simulated losses.

The thesis studies the different approximation steps and mathematically analyses the errors and approximations happening in this transformation. Three error sources were identified: One on the frequency transformation (general frequency to Poisson (2)), one on the severity transformation (effectively dropping losses in some cases), and one on the event ID injection (if the number of simulations is too low). For all three cases precise mathematical derivations and error bounds allow to understand the performance and behaviour of the algorithm.
Luca Pedrazzini MSc Reserving Methods: A Practical Overview Patrick Cheridito
Sari
De Martin
Stefan Bregy
Ernst & Young
Abstract: This Master’s thesis presents a possible way of estimating the future reserve by applying the Mack Chain Ladder, Bornhuetter-Ferguson and Cape Cod methods. By implementing the three methods through the use of the programming language R, we try to output the best estimate for each accident year, without focusing too much on the theoretical backgrounds and prefer looking at the practical aspects.
The whole process relies on the use of claim triangles and, through the technique of model selection, it allows to find the reserving method which optimises the estimation of the reserve. This is done by minimising two different aspects: the claim development result and the actual versus expected. Using different data sets, we are able to examine more results and to find a general conclusion.
Nicola Ruckstuhl MSc Multi-Population Mortality Modeling using Tensor Decomposition Mario Wüthrich
Abstract: The topic of mortality modeling is important in many different fields such as insurance, biology and medicine. Insurance companies use mortality tables, which depict the mortality rates in a specific population for different ages and calendar years, to price life insurance products such as annuities and death benefits. Mortality models can usually be divided into two categories: Single- and multi-population mortality models. In single-population modeling, the mortality rates in a single population are modeled in isolation. Single populations include for example the total population of some country or the female or male populations of that country. One of the most renowned single-population mortality models is the Lee-Carter (LC) model, which was established by Lee and Carter (1992), who used a matrix decomposition, specifically singular value decomposition, to fit and forecast U.S. mortality rates. Many mortality models are based on the LC model.
In multi-population modeling, multiple single populations are modeled simultaneously, which means that the module for each specific population within the model is affected by the other populations. Considering that it is reasonable to assume correlations between mortality trends of different countries, the aim with this type of modeling is that the single populations benefit from the increased number of observations. In order to increase this effect, it is useful for the populations to be as similar as possible when it comes to variables that might affect mortality in some way. Multi-population mortality modeling was pioneered by Li and Lee (2005). Whereas in single-population modeling, the mortality rates are considered as functions of only age and calendar year and can thus be depicted as matrices, in multi-population modeling another dimension is added by considering multiple populations simultaneously. Thus the rates are given as 3-dimensional arrays. Multi-dimensional arrays are known as tensors. In this thesis we study multi-population tensor decompositions in a similar way as in Russolillo-Giordano-Haberman (2011) and Dong-Huang-Yu-Haberman (2020).
Rui Wang MSc Discriminating Modelling Approaches for Point in Time Economic Scenario Generation Patrick Cheridito
Binghuan Lin
UBS
Abstract: We introduce the notion of Point in Time Economic Scenario Generation (PiT ESG) with a clear mathematical problem formulation to unify and compare economic scenario generation approaches conditional on forward looking market data. Such PiT ESGs should provide quicker and more flexible reactions to sudden economic changes than traditional ESGs calibrated solely to long periods of historical data. We specifically take as economic variable the S&P500 Index with the VIX Index as forward looking market data to compare the nonparametric filtered historical simulation, GARCH model with joint likelihood estimation (parametric), Restricted Boltzmann Machine and the conditional Variational Autoencoder (Generative Networks) for their suitability as PiT ESG. Our evaluation consists of statistical tests for model fit and benchmarking the out of sample forecasting quality with a strategy backtest using model output as stop loss criterion. We find that both Generative Networks outperform the nonparametric and classic parametric model in our tests, but that the CVAE seems to be particularly well suited for our purposes: yielding more robust performance and being computationally lighter.

2020

Student Thesis Title Supervisors Industry Partner Download
Carlo Casati Semester Tontines in the Light of Systematic Longevity Risk Mario Wüthrich
Irina Gemmo
Abstract: Tontines have been introduced in 1653 by the Italian banker Lorenzo de Tonti as an investment vehicle. Recently, tontines have gained a lot of popularity in the life and pension community, as they offer an alternative pension tool, where tontine subscribers bear the financial and longevity risk in a self-organized way. The purpose of this semester thesis is to review tontines and to better understand how longevity risks act on tontine subscribers under the assumption of having a heterogeneous tontine community.
Daria Filippova MSc Modelling Propensity to Type 2 Diabetes using Medical Data Mario Wüthrich
Francesca Volpe
Swiss Re
Abstract: From the insurance perspective an increase in diabetes type 2 cases, which was observed in the last decades, leads to a signifi cant surge in costs. Therefore, the goal of this thesis is to develop a model which is able to identify individuals with high propensity to developing diabetes type 2 within a predetermined time period. A successful classifi cation model may be used as an early warning system to notify individuals that are at risk and to prescribe them a prevention or mitigation program. This will result in reduction of claims and an improved risk management of health insurance companies. The classi fication model is based on Logistic Regression, which is a part of Generalised Linear Models. However, an insurer's interest is not only "if" but also "when" a case of diabetes type 2 diagnosis will occur. To answer this Survival Analysis will be leveraged. For the survival analysis of medical or health data, non-parametric statistical estimation is preferred. For this thesis, the non-parametric Cox Proportional Hazard Rate model is used since it allows for multiple predictors. It provides an extension of the classifi cation model by a new dimension, namely, the time frame. The survival model uses machine learning techniques and survival analysis in order to estimate the expected time until the diagnosis of diabetes type 2.
Andrea Gabrielli PhD Claims Reserving and Neural Networks Mario Wüthrich
Patrick Cheridito
Franco Moriconi
DOI
Abstract: In non-life insurance, an insurance claim can generally not be settled immediately at occurrence. A claims development process often takes several years. Future cash flows for claims that have occurred in the past are called outstanding loss liabilities, and a prediction thereof provides the claims reserves. Typically, the claims reserves are the largest position in the balance sheet of a non-life insurance company. This underlines the importance of the claims reserving exercise. Traditional claims reserving models such as for example Mack’s chain-ladder model or the over-dispersed Poisson reserving model work on aggregated data. These models neglect claim-specific information, which implies a considerable loss of information. Moving towards more data-driven techniques has the potential to improve accuracy of the claims reserves.The rising popularity of machine learning methods in the last couple of years promoted the development of new claims reserving techniques. One of the most popular machine learning methods are neural networks. In simple words, neural networks can be used as high-dimensional non-linear regression functions. In this thesis we combine the art of claims reserving with the power of neural networks. This thesis consists of five research papers: In Paper A we use neural networks to develop a stochastic simulation machine that generates individual non-life insurance claims. These synthetic individual claims allow us to back-test classical claims reserving models as well as to develop new claims reserving techniques. In Paper B we use the individual claims history simulation machine of Paper A in order to back-test the chain-ladder model. This study provides a general intuition of how the chain-ladder claims reserving model behaves, particularly, when the portfolio size increases. In Paper C we embed the over-dispersed Poisson reserving model into a neural network. We start the neural network calibration exactly in this over-dispersed Poisson model. Such a nested model allows us to learn model structure beyond the classical reserving model. In Paper D we extend the embedding of the over-dispersed Poisson model for claim amounts of Paper C to a joint embedding of separate over-dispersed Poisson models for both claim amounts and claim counts, exploring additional information provided by the claim counts. In Paper E we provide an individual claims reserving model for reported claims. This model uses claim-specific feature and past payment information in order to calculate claims reserves for individual reported claims. For this task we design one single neural network.
Vito Gallo MSc XVA Analysis for Bilateral Derivatives in Continuous Time Patrick Cheridito
Abstract: XVAs are add-ons that a bank dealing bilateral derivatives charges to its clients to account for counterparty risk and its capital and funding implications. In this thesis we reformulate the continuous-time analysis of XVAs of [AC18] adding important theoretical results from the theory of invariance times of [CS17] that help us set rigorous assumptions for the well-posedness of the XVA equations and improved definition of the capital value adjustment (KVA) problem. We also generalise two important assumptions: we separate the margin value adjustment (MVA) from the funding value adjustment (FVA), and we allow the liquidation period of a trade due to default of the client to be positive. These generalisations permits us to obtain a more realistic XVA model, in which we distinguish the variation and initial margin. We also obtain a generalised counterparty exposure cash-flow, which is used in the formula for the credit value adjustment (CVA) and debt value adjustment (DVA). At the end of the thesis present a simple case study portfolio of interest rate swaps that could be used in an implementation of the XVA problem. As in [AC18], we take a balance sheet perspective on the pricing and risk management of the bilateral derivatives portfolio of the bank; not only studying the pricing, but also the relative collateralisation, accounting, and dividend policy of the bank. Since the bank cannot hedge against default exposure cash-flows (of clients and of the bank itself), the bank’s shareholders have to set aside a capital at risk and a wealth transfer from shareholders to bondholders occurs at the default of the bank. By consequence, the bank charges to the clients on top of the fair valuation of counterparty risk the so-called contra-liabilities and a cost-of-capital at inception of each new trade. This results in an all-inclusive XVA formula given by CVA + FVA + MVA + KVA.
Yan-Xing Lan BSc Variational Autoencoders Mario Wüthrich
Abstract: The variational autoencoder is one of the most popular approaches to unsupervised learning and generative modelling. In the few years after its initial inception there has been extensive research on its extensions and applications. To get a better understanding of these powerful models the main focus of this bachelor thesis is to formulate the theory behind the variational autoencoder and to explore the theory on an explicit example. Therefore, we describe the mathematical theory behind the framework of the variational autoencoder in the first part and apply it on the MNIST data set in the second part of the thesis.
Marcello Monga MSc Deep Portfolio Optimization Sebastian Becker
Patrick Cheridito
Abstract: This work is aimed at introducing a new machine learning based approach to Portfolio Optimization. Its most important feature is that in contrast with classical models our methods makes it possible to handle market frictions such as transaction costs and market impact. This Master’s Thesis is divided in two parts. The first one is identified with the first Section and consists in an introduction to Portfolio Optimization and Assets and Liabilities Management. The second part is the most important and coincide with the second Section. Here, we formalize our approach and then apply it to three different examples. In the first one we play the role of an investor that do not have any external cash flow but only acts on the market. After that we assume the point of view of a pension fund and then of an insurance company. Our examples will be presented together with graphs and tables obtained using Python, which will show how our method works. Codes are reported at the end of the work.
Marc Nübel BSc Matrix Mittag-Leffler Distributions with Applications to Insurance Mario Wüthrich
Abstract: This thesis explains the construction and properties of Matrix Mittag-Leffler distributions. Furthermore, it explores the use of this family of distribution functions on a motor third party liability (MTPL) insurance data set available from the R package CASdatasets.
Nicola Ruckstuhl Semester Unintuitive Modelling Effects in Non-Proportional Reinsurance Contracts Philipp Arbenz
Patrick Cheridito
SCOR
Abstract:
Fanny Siegwart MSc Robust Wasserstein Profile Inference And Applications To Machine Learning Mario Wüthrich
Abstract: This thesis studies robust generalized linear model fitting by using the framework of distributionally robust optimization. We describe the theory which basically is optimizing over Wasserstein balls of the empirical distribution, and we relate this to ridge and LASSO regularization. Furthermore, this approach is explored on simulated and on real data.
Robin Vogtland BSc Calibration of Stochastic Volatility Models using different Neural Network Approaches Patrick Cheridito
Abstract: In finance, parametric models are often used to price various derivative contracts. The model parameters have to be calibrated to the quoted market data, i.e. the model parameters have to be chosen such that they best fit the behaviour observed in the financial market. Hence the parameters should be calibrated to the quoted market data. This leads to optimization problems that can only be solved efficiently when closed or semi-closed option pricing formulas can be derived for the model in question. For more realistic models, one has to resort to Monte Carlo sampling, making the calibration a computationally expensive optimization problem. In recent years it has been discussed in research [10, 2, 8] to speed up this procedure by using neural networks. One possibility is splitting the calibration procedure into two steps. First, learning the mapping from the parameter space to the prices of the contracts (or similarly implied volatilities) using observed data. Second, using this mapping to optimize for the best choice of model parameters. This has been shown to work well in Horvath et al., [10]. An alternative approach is to directly learn the mapping from prices and contract parameters to the model parameters. The main advantage of these approaches is the ability to train the neural networks prior to application, using large quantities of data. Once the network is trained, the calibration task can be performed relatively fast, which is of extreme importance for application in the financial industry. Therefore, the range of models, that can be used in practice is extended to include more sophisticated and accurate models, which could previously not be used due to their calibration times. In this thesis, we compare the different calibration procedures on several examples. This will show advantages for the two-step approach considering different error metrics. The direct approach only has the benefit of performing calibration extremely fast once the network is trained.
JavaScript has been disabled in your browser