Saturday, February 06, 2010

CS: Compressive Sensing Linear Algebra 101

nasa_iss_stream

Do you think about Lapack when you perform your average matrix inversion. Sure you don't, you expect that it works and you expect that your main problem is filling up the numbers in that matrix you are about to invert. The rest is known. For those of you using Matlab it's about using inv or \ and that's it. It turns out that we need to build a framework so that eventually folks at Matlab overload operators like inv or \ when the system is underdetermined so that Compressive Sensing becomes as obvious as any operations using Lapack.

Today, we are going to look at an algorithm that is used in the calibration of the Random Lens Imager. Why ? because it is the sort of capability that any engineer, physicist or chemist should be able to do without having to KNOW compressive sensing or read this blog for weeks or immersed herself in all the papers at the Rice Repository. Let us state the problem:

Suppose you have a specific physical set up that transforms elements x into elements y. You know the physical transform is linear. The problem can be stated as:

y = A x

where y and x are vectors of size n and A is matrix of size m x n. The goal of the maneuver is to find A i.e. one is looking to fill up a matrix of unknown coefficients while having access to both x and y. Sounds simple enough. You form n vectors x equal to the vectors of the canonical basis. You physically perform n times the product A x with the same n canonical vectors and obtain the y as your physical response. Because you are using canonical vector, the y's are just the columns of A. You may also notice that in this case, m could be different from n but this is not important This is basic Algebra 101 taught in high school.

What happens if you know that A is sparse and but is also very large ? [Update: you do not know the sparsity pattern of A] if you have followed the thread but don't know about compressive sensing, a naive answer would be to say: well, Igor, it doesn't matter, I'll just go for a larger number of canonical vectors, it'll take longer but that's it. In other words, knowing in advance that A is sparse is irrelevant.

Compressive Sensing is a technique that allows one to take this statement into account and by the same token reduces the number of times you will perform these matrix vector multiplications. How much reduction are we talking about ? let us say you know the sparsity of A is roughly k, instead of performing n matrix-vector operations (or n measurements in your physical system), compressive sensing guarantees under certain conditions that the number of operations will be in the realm of k log(n).

Let us appreciate some orders of magnitude here: If n = 10^7 (10 million), then for a sparsity k = 1000, the number of operations is reduced by 10^7/(10^3log(10^7)) = 620 or between 2 to three orders of magnitude. If making one measurement takes a second, then instead of taking 115 days to perform the full characterization of A, you are looking at finishing up this whole thing within 5 hours. Did I get your attention ? Good.

How do we solve this then ?

Let us first define other matrices first:

Y = A X
where
  • X is a matrix of size n x n1, in the naive approach stated above, its columns each represent the n1 first canonical vectors.
  • A is a matrix of size m x n of unknown coefficients
  • Y is a matrix of size m x n1, in the naive approach stated above, its columns represent the first n1 columns of A

In the naive approach, when n1 reaches n, one fully knows A. The statement can be put in the following form:

Yt = Xt At

where the t index represent the transpose and where
  • Xt is a matrix of size n1 x n, in the naive approach stated above, its rows each represent the n1 first canonical vectors.
  • At is a matrix of size n x m of unknown coefficients
  • Yt is a matrix of size n1 x m, in the naive approach stated above, its rows represent the first n1 columns of A
The statement can also be put in the following form:

yt_i = Xt at_i

where
  • at_i is the i-th column vector of size n x 1 of unknown coefficients of At. This statement stands for i=1 to m.
  • yt_i is the i-th column vector of size n1 x 1 of unknown coefficients of At. This statement stands for i=1 to m and as before,
  • Xt is a matrix of size m1 x n, in the naive approach stated above, its rows each represent the n1 first canonical vectors.
Each of these at_i vectors are sparse or very sparse and we are solving for all of them. In order for compressive sensing to work, we are not going to be using canonical vectors (the rows of Xt) but we are going to use random vectors. Also, since we don't know the sparsity k of A, we are going through an iterative process.

  1. Start with an empty matrix X (the size of X will be n x n1 for the remainder of the algorithm)
  2. Produce a random vector of size n x 1 that will be added as one the column of X,
  3. Increase n1 by 1
  4. Physically compute the response y = A x (x is an n x 1 vector, y is an m x 1 vector)
  5. Add the new found y to the columns of Y ( an m x n1 matrix)
  6. Solve for all vectors at_i (i goes from 1 to m) in yt_i = Xt at_i where yt_i is an n1 x 1 vector, Xt is an n1 x n matrix, at_i is an n x 1 vector. You may have noticed that Xt is underdetermined, i.e. too few rows compared to the number of columns, but there are many different solvers that solve this system of equation.
  7. If convergence is observed for At (i.e. for all the columns at_i) then stop, if not go to step 2.
Obviously, you probably don't want to start with an empty matrix first. Like in the ISD or DUMBEST algorithm, you could finesse around and instead of adding new columns of X pick only the ones of interest. This could really look good as Monday Morning Algorithm and whoever wants to implement it and make it available to the world will get this blog full attention :-)


Credit: Phil Plait, Bad Astronomy Blog, you can watch this webcam live from Space Station.

6 comments:

Stephen said...

As an outsider to compressed sensing who is very familiar with numerical linear algebra, I was quite a bit confused by the setup of this description. Rather than, "Suppose you have a specific physical set up ... unknown coefficients while having access to both x and y." I would have said: You are trying to infer an m x n matrix A with unknown elements. The only method of learning about this matrix is through applying matrix vector products Ax = y. That is, you construct a vector x, perform a multiplication Ax, and finally analyze the resulting vector y of observations.

In reality, the elements of matrix A consists of individual features of a physical system we are trying to learn about. Because this physical system A is linear, we know the act of physically measuring the system is equivalent to matrix vector multiplication Ax = y.

The rest of the post was extremely informative and useful. Thanks!

Igor said...

Stephen,

Thanks for the comment. I definitely could polish the statement of the problem up. Do you, by any chance, know the name of this trivial algorithm in linear algebra (i.e. the act of finding A through the appliication of several canonical vectors) ?

Igor.

Igor said...

Stephen,

I just used your wording in a question sent to NA digest. Thanks!

Cheers,

Igor.

Stephen said...

Igor,

Would "the act of finding A through the appliication of several canonical vectors" simply be multiplying A by (or projecting each of the columns of a onto) a set of basis vectors? That's the simplest thing I can come up with.

-Stephen

Stephen said...

(this is Stephen Becker, a different Stephen than the one above)

If you change your prior and instead of assuming that A is sparse you now assume that it has rank r (presumably with rank r < size(A) ), then you can find A (with high probability) using only 2*r linear measurements.

You need to be able to measure Y = A*X (with X having r columns), as well as Z = A'*W (with W having r columns). You only need to perform a single QR decomposition on W or Z (very cheap) and a matrix multiply. There's no need for a solver like GPSR. In practice, you can do this for a matrix of size m = n = 2^30, and if the rank is small (say, 10) it's very fast (less than 1 second for rank = 10).

Igor said...

Thanks Stephen (Becker),

Indeed if one can make an assumption on the rank then we go back to the wonderful work that you and others are doing. The problem really is that I don't know how to translate in physics terms the rank of the black box. I think it is too hard of a constraint for most physics/engineering experiment (but I'd love to be wrong).

Cheers,

Igor.

Printfriendly