Statistical Modeling 1

Main contact(s)	Frédéric Pennerath
UE	SD9	Credits	3 Coef.
Lectures	16.5 hr	Tutorials	4.5 hr
Labworks	6 hr	Exam	2 hr

Presentation

The “Statistical Modeling” courses ModStat1 and ModStat2 deal with the modeling of systems for which the outputs are sufficiently uncertain that they need to be modeled by random variables. The course begins with a review of statistics and the introduction of elementary models (e.g. naive Bayes, linear regression, etc.), moving progressively towards more complex models. While the courses present the most useful elementary models and methods in this modeling context, they are not intended to be an exhaustive catalog. The aim is rather to present, within a consistent theory, the concepts and tools common to all these models and methods, and to show how, starting from modeling hypotheses specific to each concrete problem, these concepts are logically assembled before leading to an operational method.

From a practical point of view, the aim of this course is not only to give students the means to understand and make good use of existing model implementations, but also to design their own implementations to take into account the specificities of a given problem. The course focuses on linking theory to practice: first, the hypotheses associated with a given class of problems are identified in class, followed by theoretical modeling work, leading to the definition of a model and its estimation algorithms. These results are then applied to a case study in tutorial sessions, before being implemented (in Python) and evaluated on data in practical exercises. The ModStat1 course will introduce the basic tools of statistical modeling, while the ModStat2 course will focus on hidden variable models.

Learning outcomes

Be able to choose a statistical model/method adapted to the problem under consideration and implement it appropriately
Be able to understand the theoretical concepts underlying a statistical inference method presented in a scientific article.
Be able to implement a model / statistical method in a language such as Python.
Be able to adapt a model/method to take into account the specificities of the problem being addressed.

Syllabus

Total length : 13.5h of lectures, 4.5h of tutorials, 6h of labworks and 2h of written exam

Lectures (16.5h)
- Reminders on estimators and Bayesian inference (3h)
- Introduction to Bayesian networks (1.5h)
- Modelling, evaluation, calibration, etc: example of Naive Bayes (1.5h)
- Causality (1.5h)
- Gaussian models for classification: QDA, LDA, etc (1.5h)
- Regression models: ridge, LASSO, etc (1.5h)
- Introduction to information theory (1.5h)
- Exponential family and generalized linear models (3h)
Tutorials (4.5h)
- Bayesian inference (1.5h)
- Problem modeling (1.5h)
- Linear regression and GLM (1.5h)
Labworks (6h)
- Problem modelling (3h)
- Linear regression and GLM (3h)
Written exam (2h)