HDP | Elad Aigner-Horev

VIDEO LECTURE

A Common Misconception

In a first encounter, one is under the impressions that the more features a given dataset has the better it is for us analysis-wise, so to speak. This is in fact wrong - the higher is the dimensionality of the data the worse off we are. High-dimensionality brings with it the failure of many statistical "routines" that work perfectly well for low dimensional data and form the backbone of statistical learning (for instance the k-nearest neighbour technique).

The main goal of this part of the course is to introduce and exemplify the notion referred to as the curse of dimensionality. The latter is a collection of "strange" phenomena, so to speak, that occur in high dimensions. Most compelling is the fact that, unlike their 3D counterparts, high-dimensional "oranges" (i.e. spheres) are empty inside.

Lucky for us, most datasets we encounter are such that these concentrate around low dimensional structures. Identifying these low-dimensional structures dominating the data is of course a challenge.

Probability Theory

REVISITED

In this part of the course, we revisit some well known tenants of Probability Theory but with more depth and generalisation. The aim of this section is to lay the foundations needed to take on future concepts in the course and to a certain extent get early glimpes of random vectors which constitute the first type of high-dimensional random structures that permiate our this course.

Concentration

BATLLING THE CURSE OF DIMENSIONALITY

A key tool in our "fight" against the curse of dimensionality is an array of inequalities quantifying the rate by which various distributions, which are ubiquitous throughout Probability Theory, Computer & Data Science, concentrate around their mean. Such inequalities are a part of the Theory of Large Deviations.

Our account starts with whether the CLT can provide any assistance in this regard; this on account of the fact that the tails of the Gaussian distribution are essentially the best one can hope for and the message arising from the CLT is that many distributions are essentially Gaussian. Due to a result of Berry-Esseen we shall find out that this is too naive and that the CLT falls short from providing exponential rates of convergence.

We will then be led to the works of Chernoff, Hoeffding, and Bernstien which will provide us with a formidable array of concentration inequalities with exponential rates of convergence (depending on the application). Through these explorations we shall encounter the sub-gaussian distributions as well as the sub-exponential ones.

IN CLASS

Random Matrices

Following our encounter with the Johnson-Lindensrauss low-distortion embedding result, we proceed into a deeper investigation pertaining to the properties of random matrices. A focal point of the latter is the spectrum of such matrices.

VIDEO LECTURE

IN CLASS

VIDEO LECTURE

IN CLASS

VIDEO LECTURE