Bare bones of Matrix Completion – I (7 mins read)

This last three weeks I was in Utah for a summer math school or conference whichever you prefer that was organized by the Institute of Advanced Studies (IAS), Princeton, New Jersey. Basically we had three weeks of lectures, problem sessions, endless fun discussions day in day out with free food, free stay – the resort was amazing, we stayed at the Zermatt resort (if you’re using headphones you might wanna turn the volume down a bit). Anyway, the point of this post is lectures given by Roman Vershynin that I sat through on Random Matrices and applications (basically this is a scribe). Specifically, about the famous problem called as matrix completion. Later (hopefully soon) I’ll write about another topic that definitely got my attention called as Differential Privacy.

The problem is very simple to state: let’s say we have a matrix of  M of size m\times n, this simply means we have a 2-D data array with m rows and  n columns. The rows and columns are indexed over, say, users and movies respectively. The famous Netflix prize problem had a matrix like this and of course, not every user has seen every movie. So the questions is, can we fill in the entries of the matrix otherwise incomplete matrix? Indeed, they had a test dataset (or simply test set) on which your algorithms will be tested on, whoever gets the most accuracy wins the challenge, so they get lots of $$$. This problem is obviously not well posed (why?) but as with most machine learning problems, mathematical ill-posedness doesn’t really matter as far as we can do good on the test set (this is not 100% true). In fact, the name of the game is to find mathematical theory/algorithms/models that can explain the problem/solution in practice. There are actually very many cool examples of this, probably we will see about this in later posts. Anyway, to analyze matix completion problem, we will need some tools from, of course, random matrix theory (I was actually surprised that these tools were actually developed very recently, I mean within the last 10-15 years atmost, if I’m wrong, please comment). So in this post we will look at some basic concentration inequalities.

Definition of subgaussian random variable: Let x be a random variable. Then, the following are equivalent (TFAE):
1. Tails: Pr(|x|\geq t)\leq 2\exp(-t^2/K_1^2)
2. Moments: |x|_p=(\mathbb{E}|x|^p)^{1/p}\leq K_2\sqrt{p}
3. MGF of x^2: \mathbb{E} \exp(x^2/K_3^2)\leq 2
4. MGF of x: \mathbb{E}\exp(\lambda x)\leq \exp(\lambda^2K_4^2)~\forall~\lambda\in\mathbb{R}

We will refer to these numbers before the statements through out this post. Pr denotes the probability and \mathbb{E} denotes the expectation.
In these statements, K_i's can be multiplicative factors of each other but this really doesn’t matter since we are interested in the asymptotics. Examples include: gaussian, bernoulli and nonexamples include exponential, poisson etc.. Finally we will denote \|x\|_{\psi_2}= smallest value of K in 3 (MGF of x^2).

Proposition 1: If x_i‘s are independent subgaussian random variables with mean 0, then \sum_i x_i is also a subgaussian random variable.

Proof: We will show 4 for \sum_ix_i.
\mathbb{E}\exp(\lambda\sum_ix_i)=\Pi_i\mathbb{E}\exp(\lambda x_i)\leq \Pi_i\exp(c\lambda\|x_i\|^2_{\psi_2})=\exp(c\lambda^2\sum_i\|x_i\|^2_{\psi_2})

The first equality is true because of the independence of the random variables x_i, the inequality is due to 4. Choosing K_4 to be \sum_i\|x_i\|_{\psi_2}^2 we are done. \square

Hoeffding’s Inequality: Let x_i‘s be independent subgaussian random variables with mean 0. Then \forall~t>0, ~Pr\{\|\sum_ix_i\|\geq t\}\leq 2\exp\left( -\frac{ct^2}{\sum_i\|x_i\|^2_{\psi_2}}\right)

Proof: This follows from proposition 1 and the equivalence of the definitions. \square

OK, now we need to know a bit about subexponential random variables.

Definition of subexponential random variable: Let x be a random variable. Then, the following are equivalent (TFAE):
1′. Pr(|x|\geq t)\leq \exp(-t/K_1)
2′.|x|_p=K_p~\forall~p
3′. If \mathbb{E} x=0, then \mathbb{E}\exp(\lambda x)\leq \exp(\lambda^2K_2^2)~\forall~|\lambda|\leq 1/K_2

In this case we denote \|x\|_{\psi_1}  = smallest K.

Bernstein’s Inequality: Let x_i‘s be independent subexponential random variables with mean 0. Then \forall~t>0, Pr\{|\sum_ix_i|\geq t\}\leq 2\exp\{-\min \left(\frac{t^2}{\sum_i\|x_i\|^2_{\psi_1}},\frac{t}{\max_i\|x_i\|_{\psi_1}} \right)\}

Proof (sketch): This requires the use of Markov inequality on the sum \sum_ix_i and then we use 3′. \square

Corallary: Pr\{\frac{1}{N}|\sum_{i=1}^nx_i|>t\}\leq 2\exp\{-cN\min\left( \frac{t^2}{K^2},\frac{t}{K}\right)\} where K=\max_i\|x_i \|_{\psi_1}.

Synopsis: We have Hoeffding’s inequality for subgaussians and Bernstein’s inequality for subexponentials, so what? well, if you look at the stuff inside the exp, then you’ll notice that the subgaussians are much behaved than subexponentials since they go down (concentrate) faster or better (quadratic t^2 in subgaussians vs linear t in subexponentials). In the next post we will see how this can be extended to the case where x_i are matrices.

Advertisements

One thought on “Bare bones of Matrix Completion – I (7 mins read)

  1. Pingback: Bare bones of Matrix Completion – II | Constantly confounded

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s