Criteo R&D at DevoxxFR

Check out our general architecture and hadoop cluster in this two talks given at Devoxx France 2014

September 1st, 2014

Running C# on a Linux Hadoop cluster

Assume you have a code base in C# and you want to run it in a distributed way on Hadoop.  What do you do: rewrite your historic code base in Java? Or try to forget the lesson you learnt when you were 3 years old, that a square piece cannot fit into a triangle hole?

Taking advantage of the lessons learnt the hard way by others, we chose to give the second approach a try. Here is how we managed to give a triangle shape to our square piece. And to run C# code on Hadoop in production.

August 25th, 2014

PoH – Introduction

Last month we released our new fully scalable & distributed learning infrastructure called PoH (Prediction on Hadoop), enabling us to train our prediction models on very large amount of data and providing more accurate predictions.

Have a look back 4 years ago on this amazing story ….

August 19th, 2014

Criteo releases its first public dataset : Conversion logs

We are committed at Criteo to scientific excellence. And one of the cornerstone of scientific progress is the reproducibility of experimental results.
We thus decided to publicly release the datasets used in our forthcoming papers.

And the first release is here! Olivier Chapelle will be presenting his paper on conversion modeling at KDD next week. And the associated dataset can be downloaded here.

Enjoy these gigabytes of conversion logs!

July 18th, 2014

How to win a free trip to San Francisco

One possible way was to qualify to the TopCoder Open onsite final, San Francisco, Nov 2014. Let me explain how I managed to take one of the last four places of the last qualification Marathon Match round. TopCoder Marathon Matches are complex algorithmic challenges to be solved in one or two weeks (hopefully the onsite final is only 12 hours long).

July 9th, 2014

Display Advertising Challenge by Criteo

We have launched two weeks ago an exciting challenge on the Kaggle platform.
We provided one week of anonymized data and the goal is very, very easy :
prove that you rock in CTR prediction!

I am pleased to say that it had a good start: almost 100 teams in the competition
and more than 580 submissions :)

It’s still time to join in the competition and prove you’re the best.
End of the contest: September 23rd (2.5 months remaining).

Did I mention that the prize for the winner is $10.000?