As a Gold Sponsor of the NIPS Conference, Criteo was more than excited to be part of the 30th Annual Conference on Neural Information Processing Systems in Barcelona from the 5th to the 9th of December 2016. This conference is THE conference for Criteo researchers, data scientists and engineers working on machine learning on a daily basis. Among more than five thousand peers, we attended talks and demonstrations and presented two posters on our current research topics.
At Criteo, our research challenges focus at the moment on Reinforcement Learning, Recommender Systems, and Game Theory. All in large-scale environments. Hence, attending NIPS every year is like going to Disneyland ! Here is what we will keep in mind from the 2016 edition.
Deep Reinforcement learning
Reinforcement Learning is an important subject for Criteo. Taking into account the effect of the action of the decision maker on the state distribution is crucial for our tasks. For example, when participating in an auction, our decision (bidding amount, or not bidding at all) can affect the response of other bidders, hence change the distribution of their actions.
One of the tutorials that we enjoyed was about deep reinforcement learning through policy optimization, held by Pieter Abbeel (OpenAI / UC Berkeley / Gradescope) and John Schulman (OpenAI / UC Berkeley). It was a great introduction to several recent policy-gradient and actor-critic methods and techniques. Another interesting part of this tutorial was about scaling. Pieter showed how a differential dynamic program could be used to generate suitable samples, and then use these samples for policy search to describe a regularized importance sampled policy optimization, see the paper from NIPS 2015.
Deep RL was in the spotlight during the main conference, notably, with the best paper award talk presenting Value Iteration Networks. This new approach uses RL to train a convolutional neural network that represents a differentiable approximation of the value-iteration algorithm. It was shown that by learning an explicit planning computation, VIN policies generalize better to new, unseen domains.
During the conference two platforms dedicated to RL were released: DeepMind Lab and OpenAI’s Universe. Now you can easily train your RL agents in computer-based environments. Here is a demo of what you can do.
Deep learning continues to be a very active research area. Not surprisingly, at Criteo we also apply deep learning, namely for product recommendation. Here is a selection of deep learning topics that captured our attention.
Deep neural networks are increasingly trained end-to-end, turning previously human-engineered pieces into differentiable modules of the larger neural network.
Mentioned above, the award-winning Value Iteration Networks paper features differentiable value iteration module, replacing an explicit formula for value function update with a ConvNet. In more details, the key insight is that the recurrent formula for value function update involves linear operation and max operation. On the grid this corresponds to the convolution and max pooling operation.
Several papers propose network architectures themselves to be learned, for example, Learning the Number of Neurons or Dynamic Filter Networks.
Causality is of growing importance to the community and at Criteo we pay a lot attention to discovering causal effects.
The paper Learning representations for Counterfactual inference applied deep networks representation learning power to the counterfactual inference from observational data. The general idea is to learn representations for treated and control distributions jointly. The learning objective consists of the factual prediction error with two regularization terms. First term aims at learning similar distributions of both factual and counterfactual sets, the second makes counterfactual predictions be close to the nearest observed outcome from the respective treated or control set. In experiments, learning this objective over the deep neural net performed the best on most metrics.
There are lots of theoretical and empirical evidence in favor of deep representation learning. The learning procedure is being researched actively.
R. Sutton presented an alternative to backpropagation algorithm, called CrossProp, that takes into account the influence of all the past values of input weights over the current error.
The authors of Professor Forcing paper apply GANs to generative RNNs and train a discriminator to distinguish between training and self-generated sequences. The generator is trained to fool the discriminator, forcing the observed distribution and the free-running distribution to become similar. This procedure acts as a regularizer, and results in better sample quality and generalization, particularly for long sequences.
The authors of Learning to Optimize take a meta-learning approach and propose to parametrize the gradient descent update with a LSTM learned across multiple tasks. Their approach outperformed hand-crafted optimizers, such as Adam and RMSProp.
Generative Adversarial Network is a recent method to generate new samples from a training distribution, for example, creating new images or music looking like the ones from the training set. Since their introduction only two years ago, they have received a lot of attention and were certainly one of the most trendy subject this year.
Ian Goodfellow gave a great tutorial with a full room attendance. The authors of Learning What and Where to Draw explained how to use side information with GANs to produce samples with desired features. The paper Adversarially Learned Inference described a generalization of GANs may enable to interpolate between images.
While GANs distinguish themselves from previous methods by their relative simplicity and results quality, they also prove difficult to train, leading to several talks and tricks on how to improve their convergence, such as GANs hacks talk.
Another challenge in applying GANs is that there is still no good way of measuring their performance: researchers have to access the quality of the generated samples by looking at them.
Criteo counterfactual learning dataset
Detecting causality is a key aspect to understand how a system will react under a new intervention. The “What if” workshop was dedicated to causality and methods that help to answer questions like: what happens if a robot applies a new behavior policy, can we predict the consequences of removing a certain gene in a biological cell on its phenotype, how a user will react if Criteo implements a new recommendation engine.
Read full article on our Research blog.
Post written by:
Criteo Research Team
Our lovely Community Manager / Event Manager is updating you about what's happening at Criteo Labs.See DevOps Engineer roles