## Monday, January 23, 2017

### Book: A Course in Machine Learning, Hal Daumé III

Hal mentioned it on his twitter feed:

A Course in Machine Learning by Hal Daumé III is here.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Sunday, January 22, 2017

### Sunday Morning Insight: D'avions et d'Intelligence Artificielle en France [in French]

C'est une histoire qui se passe pendant la seconde guerre mondiale. veut aider l'effort de guerre, mais parce qu'il est d'Europe centrale, il ne peut pas travailler sur les programmes ultrasecret des radars ni dans le projet Manhattan. Il se retrouve au Statistical Research Group à New York. Quand le haut commandement Allié demande à Abe si il peut les aider, il s'empare immédiatement du projet

Le problème est simple. Pendant les campagnes de France et d'Allemagne qui vise a bombarder l'effort de guerre nazie, un certain nombre d'avions de la RAF et de l'US Air Force ne reviennent pas. Ceux qui reviennent sont criblés d'impact de balles et d'obus de partout, enfin presque partout. Le haut commandement se pose la question de savoir ou et comment blinder les avions de façon a avoir plus d'avions qui survivent de ces campagnes. Leur premièr instinct et de réparer les trous des avions qui reviennent criblés.

Abe fait la remarque suivante: si les avions sont visés sur toutes les surfaces de l'avion pendant les campagnes, il faut chercher les endroits qui n'ont pas été touchés. En effet, les avions qui ne sont pas revenus sont ceux qui ont été atteints à ces endroits la: Pour répondre à la question initiale, il faut blinder les avions aux endroits qui n'ont pas été touché sur les avions qui ont survécus.
On appelle cela le biais de sélection.

Ce biais apparait quand on est en face d'un groupe et que l'on se pose la question de savoir pourquoi il n'y a pas un certain type de personne dans ce groupe. Un autre exemple plus proche est celui de faire la cartographie de l'intelligence artificielle en France a partir des listings de programme d'investissements, de startups ou d'équipes de recherche qui existent déja. Ces efforts de listing sont importants et donnent une vraie visibilité aux gouvernants. Mais la question qu'il faut aussi se poser  est de voir quels sont les endroits ou il y a une demande sociétale forte avec en face des programmes d'investissements, des startups ou des équipes de recherche qui n'existent pas en France.

PS:

Abe fera un calcul sur plus de 400 avions et trouvera qu'il faut protéger les moteurs des avions des canons de 20mm et le fuselage des mitraillettes de 7.9mm. Une copie du rapport d'Abe Wald se trouve ici: "A method of estimating plane vulnerability based on damage of survivors"

J'ai lu cette histoire très efficace pour la première fois sur le blog de John. Jordan l'a raconté de façon plus étendue ici. Nous en avions parler la première fois au meetup du Paris Machine Learning quand Léon Bottou nous avait parlé au meetup 11 de la saison 1.

Credit photo: Cameron Moll, The Counterintuitive World, Kevin Drum

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Saturday, January 21, 2017

### Saturday Morning Video: Stan Conference 2017 video streaming

The Stan Conference 2017 is streaming live on YouTube. Andrew starts with the recent US elections.

Here is the program:

• 9:00 AM - 10:00 AM
Dev talk Andrew Gelman:
"10 Things I Hate About Stan"
• 10:00 AM - 10:30 AM
Coffee
• 10:30 AM - 12:00 PM
Contributed talks 1. Jonathan Auerbach, Rob Trangucci:
"Twelve Cities: Does lowering speed limits save pedestrian lives?"
"Hierarchical Bayesian Modeling of the English Premier League"
3. Victor Lei, Nathan Sanders, Abigail Dawson:
"Advertising Attribution Modeling in the Movie Industry"
4. Woo-Young Ahn, Nate Haines, Lei Zhang:
"hBayesDM: Hierarchical Bayesian modeling of decision-making tasks"
5. Charles Margossian, Bill Gillespie:
"Differential Equation Based Models in Stan"
• 12:00 PM - 1:15 PM
Lunch
• 1:15 PM - 2:15 PM
Dev talk Michael Betancourt:
"Everything You Should Have Learned About Markov Chain Monte Carlo”
• 2:15 PM - 2:30 PM
Stretch break
• 2:30 PM - 3:45 PM
Contributed talks
1. Teddy Groves:
"How to Test IRT Models Using Simulated Data"
2. Bruno Nicenboim, Shravan Vasishth:
"Models of Retrieval in Sentence Comprehension"
3. Rob Trangucci:
"Hierarchical Gaussian Processes in Stan"
4. Nathan Sanders, Victor Lei:
"Modeling the Rate of Public Mass Shootings with Gaussian Processes"
• 3:45 PM - 4:45 PM
Mingling and coffee
• 4:45 PM - 5:40 PM
Q&A Panel
• 5:40 PM - 6:00 PM
Closing remarks Bob Carpenter:
"Where is Stan Going Next?"

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Friday, January 20, 2017

### Learning the structure of learning

If anything, there has been a flurry of effort in learning the structure of new learning architectures. Here is an ICLR2017 paper on the subject of meta learning and posters of the recent NIPS symposium on the topic.

Neural Architecture Search with Reinforcement Learning, Barret Zoph, Quoc Le (Open Review is here)

Abstract: Neural networks are powerful and flexible models that work well for many difficult learning tasks in image, speech and natural language understanding. Despite their success, neural networks are still hard to design. In this paper, we use a recurrent network to generate the model descriptions of neural networks and train this RNN with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation set. On the CIFAR-10 dataset, our method, starting from scratch, can design a novel network architecture that rivals the best human-invented architecture in terms of test set accuracy. Our CIFAR-10 model achieves a test error rate of 3.65, which is 0.09 percent better and 1.05x faster than the previous state-of-the-art model that used a similar architectural scheme. On the Penn Treebank dataset, our model can compose a novel recurrent cell that outperforms the widely-used LSTM cell, and other state-of-the-art baselines. Our cell achieves a test set perplexity of 62.4 on the Penn Treebank, which is 3.6 perplexity better than the previous state-of-the-art model. The cell can also be transferred to the character language modeling task on PTB and achieves a state-of-the-art perplexity of 1.214.

• Jürgen Schmidhuber, Introduction to Recurrent Neural Networks and Other Machines that Learn Algorithms
• Paul Werbos, Deep Learning in Recurrent Networks: From Basics To New Data on the Brain
• Li Deng, Three Cool Topics on RNN
• Risto Miikkulainen, Scaling Up Deep Learning through Neuroevolution
• Jason Weston, New Tasks and Architectures for Language Understanding and Dialogue with Memory
• Oriol Vinyals, Recurrent Nets Frontiers
• Mike Mozer, Neural Hawkes Process Memories
• Ilya Sutskever, Using a slow RL algorithm to learn a fast RL algorithm using recurrent neural networks (Arxiv)
• Marcus Hutter, Asymptotically fastest solver of all well-defined problems
• Nando de Freitas , Learning to Learn, to Program, to Explore and to Seek Knowledge (Video)
• Alex Graves, Differentiable Neural Computer
• Nal Kalchbrenner, Generative Modeling as Sequence Learning
• Panel Discussion Topic: The future of machines that learn algorithms, Panelists: Ilya Sutskever, Jürgen Schmidhuber, Li Deng, Paul Werbos, Risto Miikkulainen, Sepp Hochreiter, Moderator: Alex Graves

Posters of the recent NIPS2016 workshop

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Thursday, January 19, 2017

### Understanding deep learning requires rethinking generalization

Here is an interesting paper that pinpoints the influence of regularization on learning with Neural networks. From the paper:

Our central finding can be summarized as:
Deep neural networks easily fit random labels.

and later:

While simple to state, this observation has profound implications from a statistical learning perspective:
1. The effective capacity of neural networks is large enough for a brute-force memorization of the entire data set.
2. Even optimization on random labels remains easy. In fact, training time increases only by a small constant factor compared with training on the true labels.
3. Randomizing labels is solely a data transformation, leaving all other properties of the learning problem unchanged.

One can also read the interesting comments on OpenReview and on Reddit.

Understanding deep learning requires rethinking generalization by Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals

Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small difference between training and test performance. Conventional wisdom attributes small generalization error either to properties of the model family, or to the regularization techniques used during training.
Through extensive systematic experiments, we show how these traditional approaches fail to explain why large neural networks generalize well in practice. Specifically, our experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data. This phenomenon is qualitatively unaffected by explicit regularization, and occurs even if we replace the true images by completely unstructured random noise. We corroborate these experimental findings with a theoretical construction showing that simple depth two neural networks already have perfect finite sample expressivity as soon as the number of parameters exceeds the number of data points as it usually does in practice.
We interpret our experimental findings by comparison with traditional models.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

## Wednesday, January 18, 2017

### An NlogN Parallel Fast Direct Solver for Kernel Matrices

When Matrix Factorization meets Machine Learning:

Kernel matrices appear in machine learning and non-parametric statistics. Given N points in d dimensions and a kernel function that requires O(d) work to evaluate, we present an O(dNlogN)-work algorithm for the approximate factorization of a regularized kernel matrix, a common computational bottleneck in the training phase of a learning task. With this factorization, solving a linear system with a kernel matrix can be done with O(NlogN) work. Our algorithm only requires kernel evaluations and does not require that the kernel matrix admits an efficient global low rank approximation. Instead our factorization only assumes low-rank properties for the off-diagonal blocks under an appropriate row and column ordering. We also present a hybrid method that, when the factorization is prohibitively expensive, combines a partial factorization with iterative methods. As a highlight, we are able to approximately factorize a dense 11M×11M kernel matrix in 2 minutes on 3,072 x86 "Haswell" cores and a 4.5M×4.5M matrix in 1 minute using 4,352 "Knights Landing" cores.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Monday, January 16, 2017

### Edward: Deep Probabilistic Programming - implementation -

Dustin mentioned it on his Twitter feed:

Deep Probabilistic Programming by Dustin Tran, Matthew D. Hoffman, Rif A. Saurous, Eugene Brevdo, Kevin Murphy, David M. Blei

We propose Edward, a Turing-complete probabilistic programming language. Edward builds on two compositional representations---random variables and inference. By treating inference as a first class citizen, on a par with modeling, we show that probabilistic programming can be as flexible and computationally efficient as traditional deep learning. For flexibility, Edward makes it easy to fit the same model using a variety of composable inference methods, ranging from point estimation, to variational inference, to MCMC. In addition, Edward can reuse the modeling representation as part of inference, facilitating the design of rich variational models and generative adversarial networks. For efficiency, Edward is integrated into TensorFlow, providing significant speedups over existing probabilistic systems. For example, on a benchmark logistic regression task, Edward is at least 35x faster than Stan and PyMC3.
from the Edward page:

## A library for probabilistic modeling, inference, and criticism.

Edward is a Python library for probabilistic modeling, inference, and criticism. It is a testbed for fast experimentation and research with probabilistic models, ranging from classical hierarchical models on small data sets to complex deep probabilistic models on large data sets. Edward fuses three fields: Bayesian statistics and machine learning, deep learning, and probabilistic programming.
It supports modeling with
• Directed graphical models
• Neural networks (via libraries such as Keras and TensorFlow Slim)
• Conditionally specified undirected models
• Bayesian nonparametrics and probabilistic programs
It supports inference with
• Variational inference
• Black box variational inference
• Stochastic variational inference
• Inclusive KL divergence: $\text{KL}(p\|q)$
• Maximum a posteriori estimation
• Monte Carlo
• Hamiltonian Monte Carlo
• Stochastic gradient Langevin dynamics
• Metropolis-Hastings
• Compositions of inference
• Expectation-Maximization
• Pseudo-marginal and ABC methods
• Message passing algorithms
It supports criticism of the model and inference with
• Point-based evaluations
• Posterior predictive checks
Edward is built on top of TensorFlow. It enables features such as computational graphs, distributed training, CPU/GPU integration, automatic differentiation, and visualization with TensorBoard.

### Authors

Edward is led by Dustin Tran with guidance by David Blei. The other developers are
We are open to collaboration, and welcome researchers and developers to contribute. Check out the contributing page for how to improve Edward’s software. For broader research challenges, shoot one of us an e-mail.
Edward has benefited enormously from the helpful feedback and advice of many individuals: Jaan Altosaar, Eugene Brevdo, Allison Chaney, Joshua Dillon, Matthew Hoffman, Kevin Murphy, Rajesh Ranganath, Rif Saurous, and other members of the Blei Lab, Google Brain, and Google Research.

### Citation

We appreciate citations for Edward.
Dustin Tran, Alp Kucukelbir, Adji B. Dieng, Maja Rudolph, Dawen Liang, and David M. Blei. 2016. Edward: A library for probabilistic modeling, inference, and criticism. arXiv preprint arXiv:1610.09787.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

### Thesis: Privacy-aware and Scalable Recommender Systems using Sketching Techniques by Raghavendran Balu

Congratulations Dr. Balu !

In this thesis, we aim to study and evaluate the privacy and scalability properties of recommendersystems using sketching techniques and propose scalable privacy preserving personalization mechanisms. Hence, the thesis is at the intersection of three different topics: recommender systems, differential privacy and sketching techniques. On the privacy aspects, we are interested in both new privacy preserving mechanisms and the evaluation of such mechanisms. We observe that the primary parameter  in differential privacy is a control parameter and motivated to find techniques that can assess the privacy guarantees. We are also interested in proposing new mechanisms that are privacy preserving and get along well with the evaluation metrics. On the scalability aspects, weaim to solve the challenges arising in user modeling and item retrieval. User modeling with evolving data poses difficulties, to be addressed, in storage and adapting to new data. Also, addressing the retrieval aspects finds applications in various domains other than recommender systems. We evaluate the impact of our contributions through extensive experiments conducted on benchmark real datasets and through the results, we surmise that our contributions very well address the privacy and scalability challenges.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

## Thursday, January 12, 2017

### NIPS 2016 Tutorial: Generative Adversarial Networks / Learning in Implicit Generative Models

Last night at the Paris Machine Learning meetup, we had a presentation on GANs designed to produce images of cracks (yes, GANs on cracks has a good sound to it Julien !). Here is a short insight for readers of Nuit Blanche as written by Eric Jang in a recent blog entry (that you should read in its entirety by the way, it's all good !):

For example, if we wanted to minimize some error for image compression/reconstruction, often what we find is that a naive choice of error metric (e.g. euclidean distance to the ground truth label) results in qualitatively bad results. The design flaw is that we don’t have good perceptual similarity metrics for images that are universally applicable for the space of all images. GANs use a second “adversarial” network learn an optimal implicit distance function (in theory).
Here is a tutorial by Ian Goodfellow and a paper on the subject.

This report summarizes the tutorial presented by the author at NIPS 2016 on generative adversarial networks (GANs). The tutorial describes: (1) Why generative modeling is a topic worth studying, (2) how generative models work, and how GANs compare to other generative models, (3) the details of how GANs work, (4) research frontiers in GANs, and (5) state-of-the-art image models that combine GANs with other methods. Finally, the tutorial contains three exercises for readers to complete, and the solutions to these exercises.
Ian's slides from NIPS are here.

Learning in Implicit Generative Models by Shakir Mohamed, Balaji Lakshminarayanan
Generative adversarial networks (GANs) provide an algorithmic framework for constructing generative models with several appealing properties: they do not require a likelihood function to be specified, only a generating procedure; they provide samples that are sharp and compelling; and they allow us to harness our knowledge of building highly accurate neural network classifiers. Here, we develop our understanding of GANs with the aim of forming a rich view of this growing area of machine learning---to build connections to the diverse set of statistical thinking on this topic, of which much can be gained by a mutual exchange of ideas. We frame GANs within the wider landscape of algorithms for learning in implicit generative models--models that only specify a stochastic procedure with which to generate data--and relate these ideas to modelling problems in related fields, such as econometrics and approximate Bayesian computation. We develop likelihood-free inference methods and highlight hypothesis testing as a principle for learning in implicit generative models, using which we are able to derive the objective function used by GANs, and many other related objectives. The testing viewpoint directs our focus to the general problem of density ratio estimation. There are four approaches for density ratio estimation, one of which is a solution using classifiers to distinguish real from generated data. Other approaches such as divergence minimisation and moment matching have also been explored in the GAN literature, and we synthesise these views to form an understanding in terms of the relationships between them and the wider literature, highlighting avenues for future exploration and cross-pollination.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Wednesday, January 11, 2017

### Paris Machine Learning Meetup #5 Season 4: LIME 'Why should I trust you', Apache SAMOA, GAN for Cracks, Opps, NIPS2016

Video streaming of the event is here:

Mobiskill nous invitent dans ces nouveaux locaux. Voici le programme pour le meetup, si vous avez des annonces ou meme une presentation en plus, n'hésitez pas a remplir ce formulaire.

La salle aura une capacité de 120 personnes. La salle ouvrira ces portes avant 19h00.

On parlera mettre un sens aux données trop grandes, l'utlisation de GAN (qui ont fait fureur à NIPS), de crowdsourcing d'opportunités et si on a le temps de ce qui passe a NIPS à Barcelone.
Marco and Albert are likely to speak English while Julien, Daniel et Igor should be speaking French.

Marco Tulio Ribeiro, "Why Should I Trust You?" Explaining the Predictions of Any Classifier "
[code] arxiv link (short video presentation and longer KDD presentation)
Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust, which is fundamental if one plans to take action based on a prediction, or when choosing whether to deploy a new model. Such understanding also provides insights into the model, which can be used to transform an untrustworthy model or prediction into a trustworthy one. In this work, we propose LIME, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally around the prediction. We also propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). We show the utility of explanations via novel experiments, both simulated and with human subjects, on various scenarios that require trust: deciding if one should trust a prediction, choosing between models, improving an untrustworthy classifier, and identifying why a classifier should not be trusted.

Albert Bifet, Telecom-Paristech, "Apache SAMOA"  Github repo

In this talk, we present Apache SAMOA, an open-source platform for mining big data streams with Apache Flink, Storm and Samza. Real time analytics is becoming the fastest and most efficient way to obtain useful knowledge from what is happening now, allowing organizations to react quickly when problems appear or to detect new trends helping to improve their performance.  Apache SAMOA includes algorithms for the most common machine learning tasks such as classification and clustering. It provides a pluggable architecture that allows it to run on Apache Flink, but also with other several distributed stream processing engines such as Storm and Samza.
Julien Launay  "Cracking Crack Mechanics: Using GANs to replicate and learn more about fracture patterns" without animation link is here

When modeling transfers through a medium in civil engineering, knowing the precise influence of cracks is often complicated, doubly so since the transfer and fracture problems are often heavily linked. I will present a new way to generate “fake” cracking patterns using GANs, and will then expand on how such novel techniques can be used to learn more about fracture mechanics.
Daniel Benoilid   , foulefactory.com,    5 min talk "Man + Machine : Crowdsourcing opportunities"

How you can leverage on crowdsourcing to earn time on learning phases and provide a fall back in real time when the confidence interval isn't good.

Igor Carron, "So what happened at NIPS2016 ?"

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.