Bare bones of Matrix Completion – II (10 mins read)

Continuing last post, we will resume talking about matrix completion in this. As the title suggests, these are some basic facts about the problem so I’ll provide some interesting references at the end.

Matrix Bernstein Inequality (MBI): Let X_i, i=1,...,N be N  mean 0, n\times n symmetric matrices such that \|X_i\|\leq K. Then we have that \forall~t\geq 0, Pr\{\|\sum_iX_i\|\geq t\}\leq 2n\exp\left( -\frac{t^2/2}{\sigma^2+Kt/3}\right)\lesssim 2n\exp\left( -c\min(t^2/\sigma^2,t/K)\right) where \sigma^2=\|\sum_{i=1}^n\mathbb{E}X_i^2\|, the constants are hidden in \lesssim notation and this will be often used, basically this is analogues to the big-Oh \mathcal{O} notation.

Remark 1: This inequality is sharp for n=1 as we saw earlier.
Remark 2: \mathbb{E}\|\sum X_i\|\lesssim \sigma\sqrt{\log n}+K\log n

Here \|\cdot\| denotes the operator norm of the matrix, that is, for symmetric matrices, \|X\|=\max_{y:\|y\|=1}\|Xy\|_2 or it’s largest eigenvalue and y corresponds to the “largest” eigenvector.

Proof: Note that since X is a symmteric matrix, it can be written as X=\sum_j \lambda_iu_ju_j^T where \lambda_j,u_j corresponds to the eigenvalues and eigenvectors respectively by the spectral decomposition and we assume that \lambda_1\geq \lambda_2\geq\cdots \lambda_n. Now if we have a real valued function from reals, f:\mathbb{R}\to\mathbb{R}, then f(X) for a symmetric matrix X is defined as f(X):=\sum_jf(\lambda_j)u_ju_j^T. Alternatively, we can use matrix Taylor’s series and apply f on the coefficients. We will need two important matrix inequalities which we state without proof:

  1. Golden-Thompson inequality: tr\left(e^{A+B}\right)\leq tr\left(e^Ae^B\right). When A,B are reals, we get equality (this is because reals are commutative in multiplication whereas matrices aren’t).
  2. Trace inequality: Roman mentioned that this is Lieb’s inequality, but I think it’s much a simpler form than that so I’m calling it as just trace inequality, it states that: Let H be a fixed n\times n symmetric matrix and consider the function f(X)=tr\exp(H+\log X), tr denotes the trace or the sum of the diagonal entries. Then we have that f is concave on n\times n symmetric PSD matrices.

Combining both the above properties and Jensen’s inequality (\mathbb{E}f(X)\leq f(\mathbb{E}X)), we have that, \mathbb{E}tr\exp\left( H+\log X\right)\leq tr\exp\left( H+\log\mathbb{E}X\right). In particular, when Z:=\log X, we get,

\mathbb{E} tr\exp(H+Z)\leq tr\exp\left(H+\log \mathbb{E}e^Z \right) which we will refer to as the Lieb’s inequality from now on. Basically this inequality allows to take the expectation inside the exponential operator, pretty neat right?

Now we ready to prove MBI. It consists essentially of two steps: 1. Markov’s inequality or union bound on the sum of random matrices; 2. Lieb’s inequality.

  1. Let S=\sum_{i=1}^n X_i. Then Pr\left(\lambda_{\text{max}}(S)>t\right)=Pr\left( e^{\mu\lambda_{\text{max}}(S)}\geq e^{\mu t}\right)\leq e^{-\mu t}\mathbb{E}e^{\mu\lambda_{\text{max}}(S)}. The inequality is due to the union bound, \lambda_{\text{max}}(\cdot) denotes the largest eigenvalue.Now we have the following equalities,e^{\lambda_{\text{max}}(S)}=e^{\lambda_{\text{max}}\left(\sum_j\lambda_ju_ju_j^T\right)}=e^{\lambda_1}=\lambda_{\text{max}}\left(\sum_je^{\lambda_ju_ju_j^T}\right) =\lambda_{\text{max}}\left(e^{\sum_j\lambda_ju_ju_j^T}\right)=\lambda_{\text{max}}\left(e^S\right)

    I don’t know why there’s a line break, my apologies but I like this since you can see that doing e and \lambda_{\text{max}} are commutative operations.

    So denoting \mathbb{E}e^{\mu\lambda_{\text{max}}(S)}=:E=\mathbb{E}\lambda_{\text{max}}\left( e^{\mu S}\right)\leq \mathbb{E} tr\left( e^{\mu S}\right). The last inequality is due to the fact the e is nonnegative and nondecreasing.

  2. Lieb’s: The trick is to deal or apply Lieb’s with one random matrix at a time and continue doing it.E=\mathbb{E}tr\left(\sum_{i=1}^{N-1} \mu_iX_i+\mu X_N\right)\leq\mathbb{E}tr\exp\left( \sum_{i=1}^{N-1}\mu_iX_i+\log\mathbb{E} e^{\mu X_N}\right)\leq tr\exp\left( \sum_{i=1}^N\log\mathbb{E}e^{\mu_iX_i}\right)

It is easy to check that (or I’ll probably show this in the next post): If \|X\|\leq K, \mathbb{E}X=0, then we have that, \mathbb{E}\exp(\mu X)\preceq \exp(g(\mu)\mathbb{E}X^2),g(\mu)=\frac{\mu^2/2}{1-\frac{\mu K}{3}},~\forall~|\mu|\leq \frac{3}{K}. Applying this to the result from 2 (Lieb’s) and optimizing for \mu, we have that

Pr\left(\lambda_{\text{max}}(S)\geq t \right)\leq n\cdot\exp\left( -\mu t+g(\mu)\sigma^2\right).\square

In the next post we will apply MBI to the matrix completion problem.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s