October 10th, 2014

Criteo Announces the Mo PB Mo Problems World Tour!

That’s right, Criteo R&D is on its way to NYC for Hadoop World 2014 and is bringing its Mo  PB Mo Problems World Tour with it! No, this isn’t a PR / marketing campaign to tell the world how much “big data” we have and why we’re so cutting edge that you should become a client immediately (which is all true, btw). This is our Paris – Bay Area recruitment tour with a (sole) stop at the 2014 NYC Hadoop World conference!

If you hadn’t heard, Criteo has a massive infrastructure and massively interesting engineering problems to tackle and so we’re hiring massively :).

September 25th, 2014

Kaggle contest dataset is now available for academic use!

We have launched a Kaggle challenge on CTR prediction 3 months ago.
Large participation, close race …
…and the winner will officially be announced next week!

Some updates on the contest have been presented at the Paris Machine Learning Meetup. Please visit the site for video of the meetup and slides.
We have updated the curves representing the evolution of the contest over time:

kaggle

Meanwhile the dataset is now available for academic use.
This one is pretty big, have a lot of fun with it.
http://labs.criteo.com/downloads/2014-kaggle-display-advertising-challenge-dataset/

JB Tien.

September 10th, 2014

PoH – Part 3 – Distributed optimization

In the context of web advertising, it is crucial to be able to make predictions extremely quickly as little time is given to send a bid to the ad exchange. On average, Criteo is able to predict the click probability of a user in less than 100 microseconds, as opposed to the 50 milliseconds required by deep models, and does so up to 500 000 times per second. This is the main reason why generalized linear models like the logistic regression, which are simple, are still widely used in our industry. As such models are faster to train, the move to distributed learning was therefore not as much a priority as it might have been for other companies.

September 1st, 2014

PoH – Part 2 – Running C# on a Linux Hadoop cluster

Assume you have a code base in C# and you want to run it in a distributed way on Hadoop.  What do you do: rewrite your historic code base in Java? Or try to forget the lesson you learnt when you were 3 years old, that a square piece cannot fit into a triangle hole?

Taking advantage of the lessons learnt the hard way by others, we chose to give the second approach a try. Here is how we managed to give a triangle shape to our square piece. And to run C# code on Hadoop in production.