Tuesday, April 22, 2008

Compressed Sensing: Learning Dictionaries with MoTIF


In the big picture on Compressed Sensing, we noted that in order to perform this new kind of sampling, we first needed to know that the signal can be compressible with respect to a family of functions. How do we find these families of functions ? well, first build a dictionary of atoms using some training datasets of the known signal. I missed this series of paper for some reason, but here they are ( I could not find a software implementation readily available) :

MoTIF : an Efficient Algorithm for Learning Translation Invariant Dictionaries by Philippe Jost, Pierre Vandergheynst, Sylvain Lesage, and Rémi Gribonval. The abstract reads:
The performances of approximation using redundant expansions rely on having dictionaries adapted to the signals. In natural high-dimensional data, the statistical dependencies are, most of the time, not obvious. Learning fundamental patterns is an alternative to analytical design of bases and is nowadays a popular problem in the field of approximation theory. In many situations, the basis elements are shift invariant, thus the learning should try to find the best matching filters. We present a new algorithm for learning iteratively generating functions that can be translated at all positions in the signal to generate a highly redundant dictionary.
A more recent paper deals with a more difficult problem of dictionaries of multimodal signals in Learning Multi-Modal Dictionaries by Holger Rauhut, Karin Schnass, Gianluca Monaci, Philippe Jost, Pierre Vandergheynst, Boris Mailhé, Sylvain Lesage, and Rémi Gribonval. The abstract reads:

Real-world phenomena involve complex interactions between multiple signal modalities. As a consequence, humans are used to integrate at each instant perceptions from all their senses in order to enrich their understanding of the surrounding world. This paradigm can be also extremely useful in many signal processing and computer vision problems involving mutually related signals. The simultaneous processing of multimodal data can, in fact, reveal information that is otherwise hidden when considering the signals independently. However, in natural multimodal signals, the statistical dependencies between modalities are in general not obvious. Learning fundamental multimodal patterns could offer deep insight into the structure of such signals. In this paper, we present a novel model of multimodal signals based on their sparse decomposition over a dictionary of multimodal structures. An algorithm for iteratively learning multimodal generating functions that can be shifted at all positions in the signal is proposed, as well. The learning is defined in such a way that it can be accomplished by iteratively solving a generalized eigenvector problem, which makes the algorithm fast, flexible, and free of user-defined parameters. The proposed algorithm is applied to audiovisual sequences and it is able to discover underlying structures in the data. The detection of such audio-video patterns in audiovisual clips allows to effectively localize the sound source on the video in presence of substantial acoustic and visual distractors, outperforming state-of-the-art audiovisual localization algorithms.


Some additional explanation is given in Sylvain Lesage's Ph.D thesis and defense (100 MB ppt) (both in French) . I noted the following in the previous paper:

In this work, we analyze a broad class of multimodal signals exhibiting correlations along time. In many different research fields, the temporal correlation between multimodal data is studied: in neuroscience, electroencephalogram (EEG) and functional magnetic resonance imaging (fMRI) data are jointly analyzed to study brain activation patterns [5].

This passage got me thinking about another set of data that is currently unfolding and links to another subject area of this blog, but that is the subject of another entry.

And then finally, the issue of trying to figure out an approximation to a signal using these redundant dictionaries is treated in Atoms of all channels, unite! Average case analysis of multi-channel sparse recovery using greedy algorithms by Rémi Gribonval , Holger Rauhut, Karin Schnass, Pierre Vandergheynst. The abstract reads:

This paper provides new results on computing simultaneous sparse approximations of multichannel signals over redundant dictionaries using two greedy algorithms. The first one, p-thresholding, selects the S atoms that have the largest p-correlation while the second one, p-simultaneous matching pursuit (p-SOMP), is a generalisation of an algorithm studied by Tropp in [28]. We first provide exact recovery conditions as well as worst case analyses of all algorithms. The results, expressed using the standard cumulative coherence, are very reminiscent of the single channel case and, in particular, impose stringent restrictions on the dictionary. We unlock the situation by performing an average case analysis of both algorithms. First, we set up a general probabilistic signal model in which the coefficients of the atoms are drawn at random from the standard gaussian distribution. Second, we show that under this model, and with mild conditions on the coherence, the probability that p-thresholding and p-SOMP fail to recover the correct components is overwhelmingly small and gets smaller as the number of channels increases. Furthermore, we analyse the influence of selecting the set of correct atoms at random. We show that, if the dictionary satisfies a uniform uncertainty principle [5], the probability that simultaneous OMP fails to recover any sufficiently sparse set of atoms gets increasingly smaller as the number of channels increases. To conclude, we study the robustness of these algorithms to an imperfect knowledge of the dictionary, a situation met in sparsity-based blind source separation where the dictionary, which corresponds to a mixing matrix, is only approximately known. In this framework, we estimate the probability of failure of the considered algorithms as a function of the similarity between the reference dictionary and the approximate one, which we measure with the smallest correlation between corresponding pairs of atoms.
Of particular interest is this tidbit:

To illustrate the role of sparsity, let us introduce the coherence of the dictionary, i.ethe strongest correlation between any two distinct vectors in : $$\mu = max_{i<>j} |\phi_i . \phi_j|$$. Schematically, if a signal is a superposition of less than 1/μ elements of , this representation is unique and can be recovered by standard algorithms.


Credit: NASA/JPL/University of Arizona, Disappearing Dunes on Mars

No comments:

Printfriendly