Continuing last post, we will resume talking about matrix completion in this. As the title suggests, these are some basic facts about the problem so I’ll provide some interesting references at the end.
Matrix Bernstein Inequality (MBI): Let be mean 0, symmetric matrices such that . Then we have that where , the constants are hidden in notation and this will be often used, basically this is analogues to the big-Oh notation.
Remark 1: This inequality is sharp for as we saw earlier.
Here denotes the operator norm of the matrix, that is, for symmetric matrices, or it’s largest eigenvalue and corresponds to the “largest” eigenvector.
Proof: Note that since is a symmteric matrix, it can be written as where corresponds to the eigenvalues and eigenvectors respectively by the spectral decomposition and we assume that . Now if we have a real valued function from reals, , then for a symmetric matrix is defined as . Alternatively, we can use matrix Taylor’s series and apply on the coefficients. We will need two important matrix inequalities which we state without proof:
- Golden-Thompson inequality: . When are reals, we get equality (this is because reals are commutative in multiplication whereas matrices aren’t).
- Trace inequality: Roman mentioned that this is Lieb’s inequality, but I think it’s much a simpler form than that so I’m calling it as just trace inequality, it states that: Let be a fixed symmetric matrix and consider the function , denotes the trace or the sum of the diagonal entries. Then we have that is concave on symmetric PSD matrices.
Combining both the above properties and Jensen’s inequality (), we have that, . In particular, when , we get,
which we will refer to as the Lieb’s inequality from now on. Basically this inequality allows to take the expectation inside the exponential operator, pretty neat right?
Now we ready to prove MBI. It consists essentially of two steps: 1. Markov’s inequality or union bound on the sum of random matrices; 2. Lieb’s inequality.
- Let . Then . The inequality is due to the union bound, denotes the largest eigenvalue.Now we have the following equalities,
I don’t know why there’s a line break, my apologies but I like this since you can see that doing and are commutative operations.
So denoting . The last inequality is due to the fact the is nonnegative and nondecreasing.
- Lieb’s: The trick is to deal or apply Lieb’s with one random matrix at a time and continue doing it.
It is easy to check that (or I’ll probably show this in the next post): If , then we have that, . Applying this to the result from 2 (Lieb’s) and optimizing for , we have that
In the next post we will apply MBI to the matrix completion problem.