1. Introduction#

1.1. Welcome#

Welcome to the Statistical Foundations of Machine Learning course, part of the Master’s program in Computer Science at ULB. This course provides the essential mathematical and algorithmic foundations behind the modern techniques used in machine learning, with a strong focus on statistics, probabilistic modeling, and regression/classification methods. It is the perfect starting point for anyone aiming to understand, build, and evaluate predictive models rigorously.


1.2. Why This Course?#

Machine learning is not magic. It is grounded in solid mathematical ideas, especially probability theory, statistics, linear algebra, and optimization. Understanding how and why algorithms work is crucial for designing good systems that are not only performant but also interpretable and reliable.

This course will:

  • Build your intuition for probabilistic reasoning and statistical inference.

  • Introduce core regression and classification models, from linear regression to neural networks.

  • Teach you how to implement algorithms, visualize results, and understand performance metrics.

  • Provide practical tools and code exercises using Python and popular libraries such as NumPy, Matplotlib, scikit-learn, and PyTorch.

  • Develop your ability to choose, analyze, and compare models, considering trade-offs like bias/variance, overfitting, and interpretability.

Whether your goal is to become a data scientist, ML engineer, or researcher, these foundations will serve as the most important base for more advanced courses and projects.


1.3. Course Overview#

Here’s what you can expect:

1.3.1. Introduction to Probabilistic Methods and Monte Carlo Simulations#

We start with the building blocks of randomness and probability:

  • Simulating data with NumPy

  • Visualizing distributions with Matplotlib

  • Understanding Monte Carlo simulations

  • Working with multivariate Gaussians

  • Learning about minimization in statistical modeling

  • Exploring Gaussian Mixture Models


1.3.2. Linear Models#

From classical regression to more flexible transformations:

  • Ordinary Least Squares (OLS) regression

  • Analyzing residuals and model assumptions

  • Polynomial and Radial Basis Function (RBF) transformations

  • Ridge regression and the role of regularization


1.3.3. Preprocessing, Local Models and Tree-Based Models#

Preparing your data and working with interpretable machine learning models:

  • Data preprocessing: normalization, handling missing data

  • k-Nearest Neighbors (kNN) for regression

  • Decision trees and random forests

  • Using scikit-learn pipelines

  • Understanding the bias-variance tradeoff in practice


1.3.4. Neural Networks for Regression#

Deep learning in its most intuitive form:

  • Normalization and standardization for stable learning

  • MLPs (Multilayer Perceptrons) for regression tasks

  • Cost functions and model training

  • Matrix-based backpropagation

  • Introduction to CNNs

  • Dealing with overfitting, cross-validation

  • Model persistence: saving and loading in PyTorch


1.3.5. Ensemble Models and Feature Selection#

Combining models and selecting the most useful inputs:

  • Error decomposition and performance metrics

  • Filter, wrapper, and embedded feature selection

  • Building ensembles of models for improved accuracy


1.3.6. Classification#

From regression to classification, we shift the focus:

  • Evaluation metrics for classification (accuracy, precision, recall, F1)

  • Bayes classifier and probabilistic foundations

  • Logistic regression and ROC curves

  • Working with unequal costs

  • Alternative classification models


1.4. Practical Exercises#

This handbook serves as the practicals reference for the course. The theoretical handbook remains the main reference. We want to focus here in the implementations, the algorithms and the existing libraries for implementing the topics covered in the course. The student should refer as much as possible to the theoretical handbook before delving into the coding. Here, we try, as much as possible, to include:

  • Hands-on coding exercises

  • Mini-projects and simulations

  • Graphical visualizations of models and results

  • Opportunities to experiment with real-world datasets

Doing the exercises is not optional if you want to truly master the material. They are designed to:

  • Reinforce theoretical concepts

  • Develop your implementation skills

  • Sharpen your critical thinking about model performance and assumptions


1.5. Final Notes#

This is a theory-meets-practice course. You’ll build up solid statistical thinking and programming fluency, while also learning to critically assess and apply machine learning models. Every part of the course prepares you for future work in AI, Data Science, or scientific computing.

Take it seriously. Be curious. Challenge your understanding. And above all: have fun exploring the possibilities of statistical thinking in machine learning.

Let’s begin!