Nuit Blanche: Paris Machine Learning Meetup #3 Season 4: OPECST, Correlations, Transfer Learning, DL @Amazon, Car Sales

Wednesday, November 09, 2016

Paris Machine Learning Meetup #3 Season 4: OPECST, Correlations, Transfer Learning, DL @Amazon, Car Sales

Video of the streaming is here:

The meetup will be hosted by AAA-data / Comité des Constructeurs Français d'Automobiles and the networking event is sponsored by Zen.ly . A big thank you to them.

The program for this third regular meetup of the season (and the fifth total for season 4) is a little extraordinary this time and will feature the following:

Dominique Gillot, Sénatrice, ancienne ministre et Rapporteure avec le député Claude de Ganay d'un rapport sur l'Intelligence Artificielle pour le Parlement. Mehdi Benhabri, Administrateur de l'Office parlementaire d'évaluation des choix scientifiques et technologiques (OPECST). Important: Pour celles et ceux qui ne pourraient pas parler à la sénatrice, un questionnaire en ligne est disponible et les réponses seront adressées aux deux rapporteurs et à l'administrateur qui suit le dossier IA. Slide

Pierre Leveau, Presentation of zen.ly

Franck Bardol, Deviens un Data Ninja

Julien Simon (Amazon), Machine Learning and Deep Learning with Amazon Web Services

Gautier Marti (Hellebore Capital), A closer look at correlations

You may have already read many times that the job of a Data Scientist is to skim through a huge amount of data searching for correlations between some variables of interest. And also, that one of his worst enemies (besides correlation doesn't imply causation) is spurious correlation. But what really is correlation? Are there several types of correlations? Some "good", some "bad"? What about their estimation? This talk will be a very visual presentation around the notion of correlation and dependence. I will first illustrate how the standard linear correlation is estimated (Pearson coefficient), then some more robust alternative: the Spearman coefficient. Building on the geometric understanding of their nature, I will present a generalization that can help Data Scientists to explore, interpret, and measure the dependence (not necessarily linear or comonotonic) between the variables of a given dataset. Financial time series (stocks, credit default swaps, fx rates), and features from the UCI datasets are considered as use cases.

Yannis Ghazouani (Dataïku), Labelling images using transfer learning. An application to recommender systems

Dataiku recently worked on an e-business vacation retailer recommender system based on users' previously visited products. We created a meta model on top of classical recommender system that generated an increase of 7% in revenue during the A/B test. For this type of business, the content of the product image is paramount. The next step was obviously to add image information in the recommender. The key take away is this: you don’t need a deep learning expert to solve the tagging problem. Because labeled datasets and corresponding pre-trained neural network are available on the Internet, you can use “transfer learning” and map your problem to an existing one. The post processing step consists in grouping labels to get features associated to more global visual themes. For instance, "theme beach" = coast + ocean + sandbar. We use them to recommend customer personalized products or to address marketing issues such as : what kind of image should I propose for this product?

Sébastien Claude (AAA Data), Qui va vendre son véhicule dans les 3 prochains mois ?

La base de données AAA DATA comptabilise plus de 300 millions d’événements avérés avec un historique unique de plus de 15 ans qui permet de valider la solution prédictive proposée à ses clients. Le challenge a été d’appliquer du machine learning sur ces 300 millions d’événements

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !