MA6225: Information-Theoretic Methods in Statistical Learning

Course code: MA6225

Course title: Information-Theoretic Methods in Statistical Learning

Intended Audience: NUS graduate students (though advanced undergraduates are very welcome)

Instructor: Vincent Y. F. Tan (vtan@nus.edu.sg)

Assessment: Class Participation (25%), Quiz 1 (25%), Quiz 2 (25%), Project (25%)

References

There is no single required textbook. The course will draw from the following references:

  • Y. Polyanskiy and Y. Wu, Information Theory: From Coding to Learning, Cambridge University Press, 2025.

  • A. El Gamal and M. Raginsky, Information-theoretic Limits of Learning and Estimation, arXiv:2605.06710, 2026.

  • Selected research papers from the information theory, statistics, and machine learning literature.

Course Description

This graduate-level, proof-oriented course builds a principled, information-theoretic foundation for modern statistical learning. Starting with entropy, relative entropy, and mutual information, the course develops advanced tools—f-divergences, metric entropy, and strong data-processing inequalities—and highlights structural properties such as convexity, tensorization, and data processing that enable dimension-agnostic reasoning.

Applications connect these ideas to statistical decision theory, large-sample asymptotics, mutual-information-based generalization bounds, and fundamental lower bounds via hypothesis-testing reductions, including Le Cam's, Fano's, and Assouad's methods. Entropic techniques for estimation and the role of strong data processing in dependence, high-dimensional inference, and graph problems such as broadcasting and coloring are also treated.

The course is intended for graduate students and advanced undergraduates in mathematics, statistics, engineering, computer science, and analytics. A solid background in probability, including measure-theoretic intuition, is required. Prior exposure to convex optimization, information theory, and statistical learning is strongly recommended.

Learning Outcomes

By the end of the course, students should be able to:

  • Explain and manipulate core information measures, including entropy, KL divergence, and mutual information, and derive key identities.

  • Apply f-divergences, metric entropy, and tensorization principles to quantify statistical complexity.

  • Use strong data-processing inequalities to reason about dependence and information propagation in high-dimensional and graph-structured problems.

  • Derive and interpret minimax lower bounds using hypothesis-testing reductions such as Le Cam's method, Fano's inequality, and Assouad's lemma.

  • Employ information-theoretic techniques to obtain generalization bounds and analyze estimation procedures.

  • Formulate and prove rigorous results connecting information measures to asymptotics and statistical decision problems.

  • Read and critique contemporary research at the interface of information theory and statistical learning.

Topics

  • Entropy, relative entropy, and mutual information

  • f-divergences and data-processing inequalities

  • Convexity, tensorization, and dimension-agnostic information bounds

  • Metric entropy and statistical complexity

  • Statistical decision theory and large-sample asymptotics

  • Mutual-information-based generalization bounds

  • Le Cam, Fano, and Assouad lower-bound techniques

  • Entropic techniques for estimation

  • Strong data-processing inequalities in dependence, high-dimensional inference, and graph problems

Prerequisites

  • Solid probability background, with measure-theoretic intuition

  • Prior exposure to convex optimization, information theory, and statistical learning is strongly recommended