This course presents some algorithmic building blocks that are particularly useful in designing effective machine learning and data processing algorithms and as such deserve to be known by data scientists. First, the course introduces the sampling techniques (rejection sampling, Gibbs sampling, importance sampling, Monte Carlo Markov Chain) that are at work in inference algorithms for parametric models presented in the course "Statistical Models for Machine Learning". These techniques are implemented during labworks first on simple examples and then on more sophisticated methods (e.g. Latent Dirichlet Allocation). The course then presents the architecture and algorithms of search engines and recommendation systems. Some matrix factorization algorithms are developed in this context before being reused in the context of social network analysis. Data mining is then addressed by focusing on a few fundamental algorithms. The course concludes with algorithms and data structures adapted to the analysis of large data streams.
- Understand the general context as well as the principles of the algorithms used in different fields of data processing such as information retrieval, recommendation systems, social network analysis or data flow processing.
- Implement sampling techniques to implement statistical inference algorithms.
- Be able to apply a matrix factorization algorithm to perform dimension reduction, identify latent factors, etc. in different application areas.