Dataset for evaluation of Counterfactual Algorithms 

By: CriteoLabs / 01 Dec 2016

Criteo is pleased to announce the release of a new dataset to serve as a large-scale standardized test-bed for the evaluation of counterfactual learning methods. We at Criteo have access to several large-scale real-world datasets and we would like to share data with the external research community with the goal of both advancing research and facilitating an easier exchange of ideas. The dataset we are releasing has been prepared in partnership with Cornell University (Thorsten Joachims’ group) and the University of Amsterdam (Maarten de Rijke’s group).
Building interactive systems like bidding agents and recommendation systems for computational advertising is central to Criteo’s business. The industry standard for building such systems is to collect expert-annotated data or user feedback such as clicks that then serve as a training dataset for offline supervised learning algorithms. However, the objective of the learning algorithms is often different from the actual online metrics of interest.

Recent approaches for off-policy evaluation and learning in interactive settings appear promising [1,2]. These approaches only need the data collected during the normal operation of such systems  to evaluate and optimize offline new policies, whose objective is to maximize the actual online metrics of interest. The ability to do effective offline off-policy learning would revolutionize the process of building better interactive systems. We want to provide a standardized  dataset to systematically investigate these algorithms.

The goal of the dataset is to help identify:
– Good policy classes for the specified task.
– Good regularization mechanisms and training objectives for off-policy learning.
– Good model selection procedures (analogous to cross-validation for supervised learning).
– Algorithms that can scale to massive amounts of data.
We also wrote a paper to provide further insight into this dataset (available [here]). In this paper, we describe our standardized dataset for the evaluation of off-policy learning methods and show results comparing POEM [2], a state-of-the-art off-policy learning method, to other supervised learning baselines (regression methods and offset trees [3]). We provide experimental evidence that recent off-policy learning methods can improve upon state-of-the-art supervised learning techniques on a large-scale real-world data set.
This dataset is hosted on Amazon AWS and is available to the public. Details on how to access the dataset and/or download it can be found [here] (the dataset has over 100 millions examples, a size of 35GB gzipped / 250GB raw). If you use the dataset for your research, please cite the source and drop us a note on your research at
Damien Lefortier
[1] Counterfactual reasoning and learning systems: the example of computational advertising.  L. Bottou et al., JMLR 2013.
[2] Batch learning from logged bandit feedback through counterfactual risk minimization. A. Swaminathan et al., JMLR 2015.
[3] The offset tree for learning with partial labels. A. Beygelzimer et al., KDD 2009.

Original post from Criteo Research’s blog.

  • CriteoLabs

    Our lovely Community Manager / Event Manager is updating you about what's happening at Criteo Labs.