MA6225: Information-Theoretic Methods in Statistical Learning

Course code: MA6225

Course title: Information-Theoretic Methods in Statistical Learning

Intended Audience: NUS graduate students (though advanced undergraduates are very welcome)

Instructor: Vincent Y. F. Tan (vtan@nus.edu.sg)

Assessment: Class Participation (25%), Quiz 1 (25%), Quiz 2 (25%), Project (25%)

References

There is no single required textbook. The course will draw from the following references:

Course Description

This graduate-level, proof-oriented course builds a principled, information-theoretic foundation for modern statistical learning. Starting with entropy, relative entropy, and mutual information, the course develops advanced tools—f-divergences, metric entropy, and strong data-processing inequalities—and highlights structural properties such as convexity, tensorization, and data processing that enable dimension-agnostic reasoning.

Applications connect these ideas to statistical decision theory, large-sample asymptotics, mutual-information-based generalization bounds, and fundamental lower bounds via hypothesis-testing reductions, including Le Cam's, Fano's, and Assouad's methods. Entropic techniques for estimation and the role of strong data processing in dependence, high-dimensional inference, and graph problems such as broadcasting and coloring are also treated.

The course is intended for graduate students and advanced undergraduates in mathematics, statistics, engineering, computer science, and analytics. A solid background in probability, including measure-theoretic intuition, is required. Prior exposure to convex optimization, information theory, and statistical learning is strongly recommended.

Learning Outcomes

By the end of the course, students should be able to:

Explain and manipulate core information measures, including entropy, KL divergence, and mutual information, and derive key identities.

Apply f-divergences, metric entropy, and tensorization principles to quantify statistical complexity.

Use strong data-processing inequalities to reason about dependence and information propagation in high-dimensional and graph-structured problems.

Derive and interpret minimax lower bounds using hypothesis-testing reductions such as Le Cam's method, Fano's inequality, and Assouad's lemma.

Employ information-theoretic techniques to obtain generalization bounds and analyze estimation procedures.

Formulate and prove rigorous results connecting information measures to asymptotics and statistical decision problems.

Read and critique contemporary research at the interface of information theory and statistical learning.

Topics

Entropy, relative entropy, and mutual information

f-divergences and data-processing inequalities

Convexity, tensorization, and dimension-agnostic information bounds

Metric entropy and statistical complexity

Statistical decision theory and large-sample asymptotics

Mutual-information-based generalization bounds

Le Cam, Fano, and Assouad lower-bound techniques

Entropic techniques for estimation

Strong data-processing inequalities in dependence, high-dimensional inference, and graph problems

MA6225: Information-Theoretic Methods in Statistical Learning

References

Course Description

Learning Outcomes

Topics

Prerequisites