Criteo Releases its New Dataset

By: CriteoLabs / 31 Mar 2015

New Dataset

Criteo is pleased to announce the release of a new dataset which is an extended version of our Kaggle click prediction  dataset. With over 4 billion lines and over 1TB in size, this is the largest public machine learning dataset ever released.

As large-scale problems become more prevalent, we believe it is important to make such a dataset available to the academic community and we hope this will serve as a useful benchmark for distributed learning algorithms.

This dataset is hosted on Microsoft Azure, making it possible for researchers to directly run map-reduce jobs on that  platform. Details on how to access the dataset and/or download it can be found  here.

Drop us a line at r& if you are curious.


  • CriteoLabs

    Our lovely Community Manager / Event Manager is updating you about what's happening at Criteo Labs.