## Tuesday, September 27, 2016

### LightOn, Forward We Go.

I officially mentioned LightOn back in June (LightOn. And so it begins.). Since then, we've been busy: talking to potential investors, getting some hardware up and running and then some. Here are some of the elements we put out on our press section:

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

### Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network

So in previous times, people would talk about upsamling or superresolution ( see this eight year old blog entry entitled CS: Very High Speed Incoherent Projections for Superresolution, and a conference. on how it is done for HDTV) , in these times of the great convergence, CNNs come to the rescue (the second prepritn gives an explanation of one of the step of the first paper that was presented at CVPR:

Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network by Wenzhe Shi, Jose Caballero, Ferenc Huszár, Johannes Totz, Andrew P. Aitken, Rob Bishop, Daniel Rueckert, Zehan Wang

Recently, several models based on deep neural networks have achieved great success in terms of both reconstruction accuracy and computational performance for single image super-resolution. In these methods, the low resolution (LR) input image is upscaled to the high resolution (HR) space using a single filter, commonly bicubic interpolation, before reconstruction. This means that the super-resolution (SR) operation is performed in HR space. We demonstrate that this is sub-optimal and adds computational complexity. In this paper, we present the first convolutional neural network (CNN) capable of real-time SR of 1080p videos on a single K2 GPU. To achieve this, we propose a novel CNN architecture where the feature maps are extracted in the LR space. In addition, we introduce an efficient sub-pixel convolution layer which learns an array of upscaling filters to upscale the final LR feature maps into the HR output. By doing so, we effectively replace the handcrafted bicubic filter in the SR pipeline with more complex upscaling filters specifically trained for each feature map, whilst also reducing the computational complexity of the overall SR operation. We evaluate the proposed approach using images and videos from publicly available datasets and show that it performs significantly better (+0.15dB on Images and +0.39dB on Videos) and is an order of magnitude faster than previous CNN-based methods.

Is the deconvolution layer the same as a convolutional layer? by Wenzhe Shi, Jose Caballero, Lucas Theis, Ferenc Huszar, Andrew Aitken, Christian Ledig, Zehan Wang

In this note, we want to focus on aspects related to two questions most people asked us at CVPR about the network we presented. Firstly, What is the relationship between our proposed layer and the deconvolution layer? And secondly, why are convolutions in low-resolution (LR) space a better choice? These are key questions we tried to answer in the paper, but we were not able to go into as much depth and clarity as we would have liked in the space allowance. To better answer these questions in this note, we first discuss the relationships between the deconvolution layer in the forms of the transposed convolution layer, the sub-pixel convolutional layer and our efficient sub-pixel convolutional layer. We will refer to our efficient sub-pixel convolutional layer as a convolutional layer in LR space to distinguish it from the common sub-pixel convolutional layer. We will then show that for a fixed computational budget and complexity, a network with convolutions exclusively in LR space has more representation power at the same speed than a network that first upsamples the input in high resolution space.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Monday, September 26, 2016

### Book: Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto is here.

The 455 page draft of the second of Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto is here. Here is the table of context:
Preface to the First Edition ix
Preface to the Second Edition xiii
Summary of Notation xvii
1 The Reinforcement Learning Problem ...1
1.1 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . .1
1.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4
1.3 Elements of Reinforcement Learning . . . . . . . . . . . . . . . . . . .6
1.4 Limitations and Scope . . . . . . . . . . . . . . . . . . . . . . . . . . .7
1.5 An Extended Example: Tic-Tac-Toe . . . . . . . . . . . . . . . . . . . 10
1.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.7 History of Reinforcement Learning . . . . . . . . . . . . . . . . . . . . 15
1.8 Bibliographical Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 23
I Tabular Solution Methods....25
2 Multi-arm Bandits...27
2.1 A k-Armed Bandit Problem . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2 Action-Value Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3 Incremental Implementation . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4 Tracking a Nonstationary Problem . . . . . . . . . . . . . . . . . . . . 34
2.5 Optimistic Initial Values . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.6 Upper-Con dence-Bound Action Selection . . . . . . . . . . . . . . . . 37
2.7 Gradient Bandit Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 38
2.8 Associative Search (Contextual Bandits) . . . . . . . . . . . . . . . . . 42
2.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
iii
iv
CONTENTS
3 Finite Markov Decision Processes...47
3.1 The Agent{Environment Interface . . . . . . . . . . . . . . . . . . . . 47
3.2 Goals and Rewards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3 Returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.4 Unified Notation for Episodic and Continuing Tasks . . . . . . . . . . 54
3.5 The Markov Property . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.6 Markov Decision Processes . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.7 Value Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.8 Optimal Value Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.9 Optimality and Approximation . . . . . . . . . . . . . . . . . . . . . . 72
3.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4 Dynamic Programming....79
4.1 Policy Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.2 Policy Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.3 Policy Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.4 Value Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.5 Asynchronous Dynamic Programming . . . . . . . . . . . . . . . . . . 91
4.6 Generalized Policy Iteration . . . . . . . . . . . . . . . . . . . . . . . . 93
4.7 E ciency of Dynamic Programming . . . . . . . . . . . . . . . . . . . 94
4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5 Monte Carlo Methods....99
5.1 Monte Carlo Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.2 Monte Carlo Estimation of Action Values . . . . . . . . . . . . . . . . 104
5.3 Monte Carlo Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.4 Monte Carlo Control without Exploring Starts . . . . . . . . . . . . . 108
5.5 O -policy Prediction via Importance Sampling . . . . . . . . . . . . . 111
5.6 Incremental Implementation . . . . . . . . . . . . . . . . . . . . . . . . 116
5.7 O -Policy Monte Carlo Control . . . . . . . . . . . . . . . . . . . . . . 118

5.8 Return-Specific Importance Sampling . . . . . . . . . . . . . . . . . . 120
5.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6 Temporal-Difference Learning...127
6.1 TD Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.2 Advantages of TD Prediction Methods . . . . . . . . . . . . . . . . . . 131
6.3 Optimality of TD(0) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
6.4 Sarsa: On-Policy TD Control . . . . . . . . . . . . . . . . . . . . . . . 137
CONTENTS
v
6.5 Q-learning: O -Policy TD Control . . . . . . . . . . . . . . . . . . . . 140
6.6 Expected Sarsa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
6.7 Maximization Bias and Double Learning . . . . . . . . . . . . . . . . . 143
6.8 Games, Afterstates, and Other Special Cases . . . . . . . . . . . . . . 145
6.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
7 Multi-step Bootstrapping.....151
7.1 n-step TD Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
7.2 n-step Sarsa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.3 n-step O -policy Learning by Importance Sampling . . . . . . . . . . 158
7.4 O -policy Learning Without Importance Sampling:
The n-step Tree Backup Algorithm . . . . . . . . . . . . . . . . . . . . 160
7.5 A Unifying Algorithm: n-stepQ() . . . . . . . . . . . . . . . . . . . . 162
7.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
8 Planning and Learning with Tabular Methods.....167
8.1 Models and Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
8.2 Dyna: Integrating Planning, Acting, and Learning . . . . . . . . . . . 169
8.3 When the Model Is Wrong . . . . . . . . . . . . . . . . . . . . . . . . . 174
8.4 Prioritized Sweeping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
8.5 Planning as Part of Action Selection . . . . . . . . . . . . . . . . . . . 180
8.6 Heuristic Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
8.7 Monte Carlo Tree Search . . . . . . . . . . . . . . . . . . . . . . . . . . 183
8.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
II Approximate Solution Methods......189
9 On-policy Prediction with Approximation.....191
9.1 Value-function Approximation . . . . . . . . . . . . . . . . . . . . . . . 191
9.2 The Prediction Objective (MSVE) . . . . . . . . . . . . . . . . . . . . 192
9.3 Stochastic-gradient and Semi-gradient Methods . . . . . . . . . . . . . 194
9.4 Linear Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
9.5 Feature Construction for Linear Methods . . . . . . . . . . . . . . . . 203
9.5.1 Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
9.5.2 Fourier Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
9.5.3 Coarse Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
9.5.4 Tile Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
9.5.5 Radial Basis Functions . . . . . . . . . . . . . . . . . . . . . . . 215
vi
CONTENTS
9.6 Nonlinear Function Approximation: Arti cial Neural Networks . . . . 216
9.7 Least-Squares TD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
9.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
10 On-policy Control with Approximation....229
10.1 Episodic Semi-gradient Control . . . . . . . . . . . . . . . . . . . . . . 229
10.2 n-step Semi-gradient Sarsa . . . . . . . . . . . . . . . . . . . . . . . . 232
10.3 Average Reward: A New Problem Setting for Continuing Tasks . . . . 234
10.4 Deprecating the Discounted Setting . . . . . . . . . . . . . . . . . . . . 238
10.5 n-step Differential Semi-gradient Sarsa . . . . . . . . . . . . . . . . . . 239
10.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
11 O -policy Methods with Approximation....243
11.1 Semi-gradient Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
11.2 Baird's Counterexample . . . . . . . . . . . . . . . . . . . . . . . . . . 245
11.3 The Deadly Triad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
12 Eligibility Traces.....251
12.1 The-return . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
12.2 TD() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
12.3 An On-line Forward View . . . . . . . . . . . . . . . . . . . . . . . . . 259
12.4 True Online TD() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
12.5 Dutch Traces in Monte Carlo Learning . . . . . . . . . . . . . . . . . . 263
13.1 Policy Approximation and its Advantages . . . . . . . . . . . . . . . . 266
13.2 The Policy Gradient Theorem . . . . . . . . . . . . . . . . . . . . . . . 268
13.3 REINFORCE: Monte Carlo Policy Gradient . . . . . . . . . . . . . . . 270
13.4 REINFORCE with Baseline . . . . . . . . . . . . . . . . . . . . . . . . 272
13.5 Actor-Critic Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
13.6 Policy Gradient for Continuing Problems (Average Reward Rate) . . . 275
13.7 Policy Parameterization for Continuous Actions . . . . . . . . . . . . . 278
III Looking Deeper...280
14 Psychology  281
14.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
14.2 Prediction and Control . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
CONTENTS
vii
14.3 Classical Conditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
14.3.1 The Rescorla-Wagner Model . . . . . . . . . . . . . . . . . . . 289
14.3.2 The TD Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
14.3.3 TD Model Simulations . . . . . . . . . . . . . . . . . . . . . . . 292
14.4 Instrumental Conditioning . . . . . . . . . . . . . . . . . . . . . . . . . 301
14.5 Delayed Reinforcement . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
14.6 Cognitive Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
14.7 Habitual and Goal-Directed Behavior . . . . . . . . . . . . . . . . . . . 309
14.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
14.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
14.10Bibliographical and Historical Remarks . . . . . . . . . . . . . . . . . 315
15 Neuroscience....319
15.1 Neuroscience Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
15.2 Reward Signals, Reinforcement Signals, Values, and Prediction Errors 322
15.3 The Reward Prediction Error Hypothesis . . . . . . . . . . . . . . . . 324
15.4 Dopamine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
15.5 Experimental Support for the Reward Prediction Error Hypothesis . . 329
15.6 TD Error/Dopamine Correspondence . . . . . . . . . . . . . . . . . . . 332
15.7 Neural Actor-Critic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
15.8 Actor and Critic Learning Rules . . . . . . . . . . . . . . . . . . . . . 342
15.9 Hedonistic Neurons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
15.10Collective Reinforcement Learning . . . . . . . . . . . . . . . . . . . . 348
15.11Model-Based Methods in the Brain . . . . . . . . . . . . . . . . . . . . 351
15.12Addiction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
15.13Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
15.14Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
15.15Bibliographical and Historical Remarks . . . . . . . . . . . . . . . . . 357
16 Applications and Case Studies....365
16.1 TD-Gammon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
16.2 Samuel's Checkers Player . . . . . . . . . . . . . . . . . . . . . . . . . 370
16.3 The Acrobot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
16.4 Watson's Daily-Double Wagering . . . . . . . . . . . . . . . . . . . . 376
16.5 Optimizing Memory Control . . . . . . . . . . . . . . . . . . . . . . . . 379
16.6 Human-Level Video Game Play . . . . . . . . . . . . . . . . . . . . . . 384
16.7 Mastering the Game of Go . . . . . . . . . . . . . . . . . . . . . . . . . 389
viii
CONTENTS
16.8 Personalized Web Services . . . . . . . . . . . . . . . . . . . . . . . . . 396
16.9 Thermal Soaring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
17 Frontiers....403
17.1 The Unified View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
References....407

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Sunday, September 25, 2016

### Sunday Morning Video: Bay Area Deep Learning School Day 2 Live streaming

The Bay Area Deep Learning School Day 2 starts streaming in 30 minutes at 9:00AM PST / 12PM EST / 5:00PM London time / 6:00PM Paris time and it's all here. The whole schedule is here. Yesterday's video of day 1 has already garnered 16,000 views.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

### Sunday Morning Videos: HORSE2016, On “Horses” and “Potemkin Villages” in Applied Machine Learning

While waiting for day 2 of the Bay Area Deep Learning school in three hours, here are the ten videos of presentations made at the HORSE2016 workshop (On “Horses” and “Potemkin Villages” in Applied Machine Learning) organized by Bob Sturm. Bob came to present something around that theme at the 9th meetup of Season 3 of the Paris Machine Learning meetup. With this workshop, the field and attendant issues are becoming more visible: this is outstanding! as this has bearing on algorithm bias and explainability.  To make the video-watching more targeted, Bob even included commentaries with embedded videos in his blog post, here is the beginning of the whole blog entry, you should go there:

September 19, 2016 saw the successful premier edition of HORSE2016: On “Horses” and “Potemkin Villages” in Applied Machine Learning. I have now uploaded videos to the HORSE2016 YouTube channel, and posted slides to the HORSE2016 webpage. I embed the videos below with some commentary.
HORSE2016 had 10 speakers expound on a variety of interesting topics, and about 60 people in the audience. I am extremely pleased that the audience included several people from outside academia, including industry, government employees and artists. This shows how many have recognised the extent to which machine learning and artificial intelligence are impacting our daily lives. The issues explored at HORSE2016 are essential to ensuring this impact remains beneficial and not detrimental.
Here is my introductory presentation, “On Horse Taxonomy and Taxidermy”. This talk is all about “horses” in applied machine learning: what are they? Why is this important and relevant today? Why the metaphor, and why is it appropriate? I present an example “horse,” uncovered using an intervention experiment and a generation experiment. Finally, I discuss what a researcher should do if someone demonstrates their system is a “horse”.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Saturday, September 24, 2016

### Saturday Morning Video: Bay Area Deep Learning School Day 1 Live streaming

So the Bay Area Deep Learning has already started streaming and it's all here. The whole schedule is here.

Saturday
​9:00-10:00

Hugo Larochelle​

I will cover the design of convolutional neural network (ConvNet) architectures for image understanding, the history of state of the art models on the ImageNet Large Scale Visual Recognition Challenge, and some of the most recent patterns of developments in this area. I will also talk about ConvNet architectures in the context of related visual recognition tasks such as object detection, segmentation, and video processing.

10:15-11:45

Deep Learning for Computer Vision

Andrej Karpathy

12:45-2:15

Deep Learning for NLP

I will describe the foundations of deep learning for natural language processing: word vectors, recurrent neural networks, tasks and models influenced by linguistics. I will end with some recent models that put together all these basic lego blocks into a very powerful deep architecture called dynamic memory network.

Richard Socher

2:45-3:45​
Tensorflow Tutorial

Sherry Moore

4:00-5:30
Foundations of Deep Unsupervised Learning
Building intelligent systems that are capable of extracting meaningful

representations from high-dimensional data lies at the core of solving many Artificial Intelligence tasks, including visual object recognition, information retrieval, speech perception, and language understanding. In this tutorial I will discuss mathematical basics of many popular unsupervised models, including Sparse Coding, Autoencoders, Restricted Boltzmann Machines (RBMs), Deep Boltzmann Machines (DBMs), and Variational Autoencoders (VAE). I will furtherdemonstrate that these models are capable of extracting useful hierarchical representations from high dimensional data with applications in visual object recognition, information retrieval, and natural language processing. Finally, time permitting, I will briefly discuss models that can generate natural language descriptions (captions) of images, as well as generate images from captions using attention mechanism.

Ruslan Salakhutdinov

6:00-7:00 Nuts and bolts of applying deep learning

Andrew Ng

Saturday.

Sunday.

9:00-10:30

Policy Gradients and Q-Learning: Rise to Power, Rivalry, and Reunification

I'll start by providing an overview of the state of the art in deep reinforcement learning, including recent applications to video games (e.g., Atari), board games (AlphaGo) and simulated robotics. Then I'll give a tutorial introduction to the two methods that lie at the core of these results: policy gradients and Q-learning. Finally, I'll present a new analysis that shows the close similarity between these two methods. A theme of the talk will be to not only ask "what works?", but also "when does it work?" and "why does it work?"; and to find the kind of answers that are actionable for tuning one's implementation and designing better algorithms.

John Schulman

10:45-11:45
Theano Tutorial

Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently, on CPU or GPU. Since its introduction, Theano has been one of the most popular frameworks in the machine learning community, and multiple frameworks for deep learning have been built on top of it (Lasagne, Keras, Blocks, ...). This tutorial will focus first on the concepts behind Theano and how to build and evaluate simple expressions, and then we will see how more complex models can be defined and trained.

Patrice Lamblin

12:45-2:15
Deep Learning for Speech

Traditional speech recognition systems are built from numerous modules, each requiring its own challenging engineering. With deep learning it is now possible to create neural networks that perform most of the tasks of a traditional engine "end to end", dramatically simplifying the development of new speech systems and opening a path to human-level performance. In this tutorial, we will walk through the steps for constructing one type of end-to-end system similar to Baidu's "Deep Speech" model. We will put all of the pieces together to form a "scale model" of a state of the art speech system; small-scale versions of the neural networks now powering production speech engines.

2:45-3:45
Torch Tutorial

Torch is an open platform for scientific computing in the Lua language, with a focus on machine learning, in particular deep learning. Torch is distinguished from other array libraries by having first-class support for GPU computation, and a clear, interactive and imperative style. Further, through the "NN" library, Torch has broad support for building and training neural networks by composing primitive blocks or layers together in compute graphs. Torch, although benefitting from

extensive industry support, is a community owned and community developed ecosystem. All neural net libraries, including Torch NN, TensorFlow and Theano, rely on automatic differentiation (AD) to manage the computation of gradients of complex compositions of functions. I will present some general background on automatic differentiation (AD), which is the fundamental abstraction of gradient based optimization, and demonstrate

Alex Wiltschko

4:00-5:30
Sequence to Sequence Learning for NLP and Speech

I will first present the foundations of sequence to sequence (seq2seq) learning and attention models, and their applications in machine translation and speech recognition. Then I will discuss attention with pointers and functions. Finally I will describe how reinforcement learning can play a role in seq2seq and attention models.

Quoc Le

6:00-7:00
Foundations and Challenges of Deep Learning

Why is deep learning working as well as it does? What are some big challenges that remain ahead? This talk will first survey some key factors in the success of deep learning. First, from the context of the no-free lunch theorem, we will discuss the expressive power of deep netwroks to capture abstract distributed representations. Second, we will discuss our surprising ability to actually optimize the parameters of neural networks in spite of their non-convexity. We will then consider a few challenges ahead, including the core representation question of disentangling the underlying explanatory factors of variation, especially with unsupervised learning, why this is important for bringing reinforcement learning to the next level, and optimization questions that remain challenging, such as learning of long-term dependencies, understanding the optimization landscape of deep networks, and how learning in brains remain a mystery worth attacking from the deep learning perspective.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Friday, September 23, 2016

### It's Friday, it's Hamming's time: Call for deep learning research problems from your "problem surplus" stack

Francois Chollet the creator of Keras tweeted the following interesting proposition:
Here it is:

Are you a researcher? Then you probably have a "problem surplus": a list of interesting and important research problems that you don't have time to work on yourself. What if you could outsource some of these problems to distributed teams of motivated students and independent researchers looking to build experience in deep learning?You just have to submit the description of your problem, some pointers as to how to get started, and provide lightweight supervision along the way (occasionally answer questions, provide feedback, suggest experiments to try...).
What you get out of this:
• Innovative solutions to research problems that matter to you.
• Full credits for the value you provide along the research process.
• New contacts among bright people outside of your usual circles.
• A fun experience.
We are looking for both deep learning research problems, and problems from other fields that could be solved using deep learning.
Note that the information you submit here may be made public (except for your contact information). We will create a website listing the problems submitted, where people will be able to self-organize into teams dedicated to specific problems. You will be in contact with the people working on your problem via a mailing list. The research process will take place in the open, with communications being publicly available and code being released on GitHub.

Here are some problems:

Enhanced NAVCAM image of Comet 67P/C-G taken on 18 September 2016, 12.1 km from the nucleus centre. The scale is 1.0 m/pixel and the image measures about 1.1 km across. Credits: ESA/Rosetta/NAVCAM – CC BY-SA IGO 3.0

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

### CSjobs: Lecturers (Assistant Professors) in: Machine Learning & Computer Vision; Robot Vision & Autonomous Systems

Mark just sent me the following:

Dear Igor, I thought that the Lecturer (Assistant Professor) jobs below may be of interest to Nuit Blanche readers. Best wishes, Mark
Here is the announcement:
----
Lecturers (Assistant Professors) in: Machine Learning & Computer Vision; Robot Vision & Autonomous Systems

Centre for Vision, Speech and Signal Processing (CVSSP)
University of Surrey, UK

Salary:  GBP 39,324 to 46,924 per annum

Closing Date:  Monday 31 October 2016

https://jobs.surrey.ac.uk/070216

The University offers a unique opportunity for two individuals with outstanding research and leadership to join the Centre for Vision, Speech and Signal Processing (CVSSP).

The successful candidate is expected to build a research project portfolio to complement existing CVSSP strengths. The centre seeks to appoint two individuals with an excellent research track-record and international profile to lead future growth of research activities in one or more of the following areas:

* Machine Learning & Pattern Recognition
* Computer Vision
* Robot Vision & Autonomous Systems
* Intelligent Sensing and Sensor Networks
* Audio-Visual Signal and Media Processing
* Big Visual Data Understanding
* Machine Intelligence

We now seek individuals with strong research track-records and leadership potential who can develop the existing activities of CVSSP and exploit the synergetic possibilities that exist within the centre, across the University and regionally with UK industry. You will possess proven management and leadership qualities, demonstrating achievements in scholarship and research at a national and international level, and will have experience of teaching within HE.

CVSSP is one of the primary centres for computer vision & audio-visual signal processing in Europe with over 120 researchers, a grant portfolio of £18M and a track-record of pioneering research leading to technology transfer in collaboration with UK industry.  CVSSP forms part of the Department of Electronic Engineering, recognised as a top department for both Teaching and Research: surrey.ac.uk/ee.

Interviews are expected to take place in the week commencing 21st November 2016.

Further details: https://jobs.surrey.ac.uk/070216

We can offer a generous remuneration package, which includes relocation assistance where appropriate, an attractive research environment, the latest teaching facilities, and access to a variety of staff development opportunities.

We acknowledge, understand and embrace diversity.

--
Prof Mark D Plumbley
Professor of Signal Processing
Centre for Vision, Speech and Signal Processing (CVSSP)
University of Surrey
Guildford, Surrey, GU2 7XH, UK
Email: m.plumbley@surrey.ac.uk

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Thursday, September 22, 2016

### Hamiltonian Monte Carlo Acceleration Using Surrogate Functions with Random Bases

We just had a meetup on Stan where Eric mentioned Michael Betancourt's video presentation on Hamiltonian Monte Carlo (see below) and I note the Random Features expansion can speed some of these computations:

Hamiltonian Monte Carlo Acceleration Using Surrogate Functions with Random Bases by Cheng Zhang, Babak Shahbaba, Hongkai Zhao

For big data analysis, high computational cost for Bayesian methods often limits their applications in practice. In recent years, there have been many attempts to improve computational efficiency of Bayesian inference. Here we propose an efficient and scalable computational technique for a state-of-the-art Markov Chain Monte Carlo (MCMC) methods, namely, Hamiltonian Monte Carlo (HMC). The key idea is to explore and exploit the structure and regularity in parameter space for the underlying probabilistic model to construct an effective approximation of its geometric properties. To this end, we build a surrogate function to approximate the target distribution using properly chosen random bases and an efficient optimization process. The resulting method provides a flexible, scalable, and efficient sampling algorithm, which converges to the correct target distribution. We show that by choosing the basis functions and optimization process differently, our method can be related to other approaches for the construction of surrogate functions such as generalized additive models or Gaussian process models. Experiments based on simulated and real data show that our approach leads to substantially more efficient sampling algorithms compared to existing state-of-the art methods.

The Geometric Foundations of Hamiltonian Monte Carlo
M. J. Betancourt, Simon Byrne, Samuel Livingstone, Mark Girolami

Although Hamiltonian Monte Carlo has proven an empirical success, the lack of a rigorous theoretical understanding of the algorithm has in many ways impeded both principled developments of the method and use of the algorithm in practice. In this paper we develop the formal foundations of the algorithm through the construction of measures on smooth manifolds, and demonstrate how the theory naturally identifies efficient implementations and motivates promising generalizations.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Wednesday, September 21, 2016

### Paris Machine Learning Meetup, Hors série #2: Scalable Machine Learning with H2O

Tonight, we will have Paris Machine Learning Meetup, Hors série #2 on Scalable Machine Learning with H2O. Jiqiong Qiu is a co-organizer of this meetup with Franck Bardol and Igor Carron. This "Hors Série 2" will be focused on a few presentations of H2O -H20.ai- by Jo-fai Chow and Jakub Háva. The meetup will be hosted and sponsored by Murex.

If you want to try the H2o, feel free to follow this installation guide. We provide not only the stand alone installation guide but also a dockerfile with H2o+Tensorflow+Spark.

All the presentations are at: http://bit.ly/h2o_paris_1

1. Introduction to Machine Learning with H2O (Joe - 30 mins)
In this talk, I will walk you through our company (H2O.ai), our open-source machine learning platform (H2O) and use cases from some of our users. This will be useful for attendees who are not familiar with H2O.

2. Project “Deep Water” (H2O integration with TensorFlow and other deep learning libraries) (Joe - 30 mins)
The “Deep Water" project is about integrating our H2O platform with other open-source deep learning libraries such as TensorFlow from Google. We are also working GPU implementation for H2O.In this talk about the motivation and potential benefits of our recent project named “Deep Water”.

3. Sparkling Water 2.0 (Jakub - 45 mins)
Sparkling Water integrates the H2O open source distributed machine learning platform with the capabilities of Apache Spark. It allows users to leverage H2O’s machine learning algorithms with Apache Spark applications via Scala, Python, R or H2O’s Flow GUI which makes Sparkling Water a great enterprise solution. Sparkling Water 2.0 was built to coincide with the release of Apache Spark 2.0 and introduces several new features. These include the ability to use H2O frames as Apache Spark’s SQL datasource, transparent integration into Apache Spark machine learning pipelines, the power to use Apache Spark algorithms via the Flow GUI and easier deployment of Sparkling Water in a Python environment. In this talk we will introduce the basic architecture of Sparkling Water and provide an overview of the new features available in Sparkling Water 2.0. The talk will also include a live demo showing how to integrate H2O algorithms into Apache Spark pipelines – no terminal needed!

Jo-fai (or Joe) is a data scientist at H2O.ai. Before joining H2O, he was in the business intelligence team at Virgin Media where he developed data products to enable quick and smart business decisions. He also worked (part-time) for Domino Data Lab as a data science evangelist promoting products via blogging and giving talks at meetups. Joe has a background in water engineering. Before his data science journey, he was an EngD researcher at STREAM Industrial Doctorate Centre working on machine learning techniques for drainage design optimization. Prior to that, he was an asset management consultant specialized in data mining and constrained optimization for the utilities sector in UK and abroad. He also holds a MSc in Environmental Management and a BEng in Civil Engineering.

Jakub (or “Kuba”) finished his bachelors degree in computer science at Charles University in Prague, and is currently finishing his master’s in software engineering as well. As a bachelors thesis, Kuba wrote a small platform for distributed computing of tasks of any type. On his current masters studies he’s developing a cluster monitoring tool for JVM based languages which should make debugging and reasoning about performance of distributed systems easier using a concept called distributed stack traces. At H2O, Kuba mostly works on Sparkling Water project.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.