Product catalog in Criteo is the most fundamental component. Processing and exploring the information inside products is an interesting topic. How to make the catalog processing easy and scalable is also a challenging topic.
Some facts: the status and the numbers
In universal catalogs, we are now conducting two major missions: catalog export and catalog enrichment. One is to process various catalogs and export to different publishers like Facebook or Google. The other one is to enrich the information inside all catalogs to provide better prospecting or prediction performance.
At Criteo, we have 15,000+ customers and most of them update their catalogs daily, but some might upload ~10 times per day. The catalog sizes also vary, from ~10 products to hundreds of millions of products. Taking the current catalog export for Facebook DPA only as an example, every day, we are processing 10,000+ catalogs, which counts as 5-6 billion products or Tera bytes of data on the HDFS, and uploading ~700-900 GB data to Facebook. With the processing for other use cases, we could easily reach yet another scale.
Therefore, we want to make the catalog processing easy and scalable.
How to make the processing easy?
Currently, all catalogs are processed using Hadoop M/R jobs. We provide a general framework to ease the development of catalog processing for developers who do not even have any Hadoop experience.
In this framework, we decoupled the data flow handling from the processing logic. The data flow handling is responsible for partitioning the catalogs to be processed in a generic way. It automatically groups the catalogs and distributes the processing to the Hadoop cluster. It does not need to care how the catalogs are processed, but just manage the data flow and guarantee the input and output catalogs are formatted following google shopping products spec, stored at advertiser level and persisted properly. On the other hand, the processing logic only cares how the products should be manipulated. The processing is dynamically injected into the framework through arguments. We also map the processing counters to Hadoop counters, which will be automatically logged and monitored.
A processing job is implemented in a three level architecture, named as workflow, strategy and factory. The workflow is the first level, which manages the input catalogs and chooses the strategy. The strategy will examine the input catalogs to determine which ones should be processed. Then, it partitions them into multiple small groups with customizable partitioning strategies. After that, it will start a thread pool and submit each group to the factory to execute the tasks simultaneously. It will keep monitoring these tasks until they are finished or timeout. The strategy will also perform the post-processing to generate events for the downstream jobs. The processing logic is defined by the third level factory. By default, the factory could initiate different M/R jobs and submit to Hadoop cluster. The factory is only responsible to construct the M/R jobs, but does not know the processing performed.
Inside the factory, we expose an interface called ProductBusinessLogicInterface, which is the only part the developers should implement. This processing is injected from the arguments and instantiated by a processor manager. So it is simply “you implement it, we run it”. The job execution plan is illustrated in Figure 1. below
How to make the processing scalable:
Scaling the processing for all Criteo catalogs is challenging. We use an internal workflow manager, where we define a catalog-processing unit as a job, and jobs are concatenated to form pipelines. The jobs could arbitrarily switch the order if they have no data dependency and the pipelines could branch out to share the same up streaming jobs.
We could start multiple instances of the same pipeline, and they are all isolated. In each instance, it is not good to start a M/R job for each catalog. Indeed, some catalogs are small, while some others are huge. So we will group a bunch of catalogs into one M/R job based on their size, and combine some files into one mapper to save the resource on the Hadoop cluster. For example, we could group ~50 or event smaller partners into on M/R job and start only few mappers to process all these catalogs. This could save ~10 times resource on the cluster.
The job itself does not need to know how the products are processed. It only manages the input and output catalogs to partition and group them appropriately. Figure 2 shows an example of a pipeline and how the catalogs are handled inside a job.
Pipeline performance tuning
Another way to make the processing scale is to improve the performance, thus reduce the processing time and resource consumption. We try to improve the performance wherever we can.
In Facebook DPA, we are using multiple feeds to support big catalogs, where some catalogs might have 200+ million products. At the beginning, we used to only generate one file and upload this big file to FB, which could take up to 4~5 hours. We are now using multiple feeds, where we break this big catalog into much smaller ones, and we could also upload these feeds simultaneously. This enabled us to process almost all catalogs within 1 hour.
Also, for Facebook DPA, as we are processing billions of products daily, this could also cause scalability problems for other Criteo services, such as Image service. Indeed, Facebook might query new images for millions of products simultaneously, and overflow our image servers, which happened unfortunately. In order to resolve this problem, we implemented a throttle method to slow down the upload speed on feed level, which significantly reduced the peak traffic to Pix servers.
Product processing performance tuning
Each product takes less then 1ms to be processed, but when we do it for billions of products every day, it’s a lot of hours and days of CPU time. If product processing time can be reduced by a fraction of millisecond, it could save us about an hour of CPU time on each run. Overall, it might be more than one CPU-day each day. It is always cool to make our planet a little bit greener by consuming less electricity.
First things first, we must understand complexity and overheads of algorithms used inside M/R jobs. Product processing requires a lot of string manipulations. Most of the string processing problems can be solved with a good hash function and Java has one inside String class. In addition, an object of String class is constant, but its hash isn’t calculated at the time of object creation. In fact, hash value is lazy initialized in the first call to hashCode() method, all subsequent calls to this method would return precalculated value. It means that if String object is used as key for a HashMap more than one time, then it’s a good idea to reuse the same object, since its hash would be calculated only once. Besides, avoiding excessive string concatenation would also improve run time.
Certainly, string hashing isn’t enough to make product processing efficient, sometimes we need to use more advanced algorithms. We use Aho-Corasick algorithm for dictionary matching. It allows finding all occurrences of words from predefined dictionary in a text. It’s very important that the complexity of this algorithm is linear. It depends only on the length of input text and doesn’t depend on the size of the dictionary. It makes this algorithm by far more efficient than naïve approach of text tokenization and searching in HashSet or using Knuth-Morris-Pratt algorithm. Decent Java implementation of Aho-Corasick algorithm can be found at ahocorasick.org.
It’s worth mentioning that Hadoop supplies string data to the mappers in objects of class Text and this class stores data as a UTF-8 encoded string. In addition, all output data should be UTF-8 encoded. But, as you know, internally Java String object stores data encoded as UTF-16, so it’s required to convert encoding to load Text into String, and converting large string from one encoding to another can be expensive operation. In universal catalog, both input and output data are in JSON format and we use Jackson library to serialize and deserialize it. Jackson can directly read and write UTF-8 encoded data without converting to UTF-16. We can benefit from it and avoid encoding conversions, which saves us about 10% of product processing time.
People love or hate Graphite and whatever might be you advice on it, it’s huge part of the open source monitoring ecosystem. We use it extensively at Criteo and the default clustering features reached their limits for us.
Gluing things together we came up with BigGraphite, a set of Graphite plugins to integrate it with Cassandra and leverage its high-availability, fault tolerance and administrative features.
Graphite works using the push model, where applications send points periodically to a metric receiver. It also has fixed period retention and doesn’t allow (or work well with) dynamic resolutions. This means that applications usually send their metrics every minutes or so.
Graphs are then rendered using Graphite Web directly, or using a frontend such as Grafana.
Jeudi 24 Novembre chez Criteo – rue Blanche, à Paris
Criteo vous fait rencontrer des femmes ingénieures, scientifiques, entrepreneures, qui viennent nous parler de leur parcours, difficultés (ou pas, ça peut aussi valoir le coup), de leurs ambitions, pour que les femmes prennent leur place dans le monde professionnel, souvent confronté à un univers masculin. Qui au pire les rejette et au mieux les ignore.
Ces femmes viennent encourager, motiver, et raconter des parcours qui n’ont pas été encore entendu jusqu’ici. Et rester positives !
« Quelle place pour les femmes dans la construction de ce nouveau monde ? Qui sont les femmes actives dans ces domaines ? Que font-elles ? Pourquoi ont-elles choisi ces voies ? Comment se fait-il que les femmes soient sous-représentées dans les STEM ? Sur quoi repose ce phénomène ? Pourquoi cela pose-t-il un problème ? Comment y remédier ? »
Une journée de mentoring sur mesure durant laquelle les femmes présentes partageront parcours, expériences, conseils et bonnes pratiques.
8h30 // Enregistrement – hall d’accueil + petit déjeuner
9h00 // Introduction par Alexandra Pelissero, Manager PR Chez Criteo
9h30// Keynote par Cecilia Ercoli, Directrice d’Innovation chez Cartier International
10h00 // Panel de discussions : les risques et challenges au quotidien des femmes dans le digital (Nicole Kelsey, General Counsel Criteo – Afef Feki, Senior Researcher Huawei – Caroline Chavier, R&D Recruiter Criteo – Yolande Chavanne, Manager Produit @honestica – Lucie Bailly, Tech Ops Analyst R&D Criteo) Modérateur : Audrey Hibon
10h50// Pause café
11h00 // Claude Terosier, CEO/Co-fondatrice Magic Makers
11h45 // “Je suis Top” par la comédienne Blandine Métayer
12h00 // Pause Déjeuner – Zenroom
13h30 // ‘’Des raisons d’y croire’’ par Corinne Bach, Vice-Présidente Innovation de Projets Vivendi Village
14h00 // “Comment le fait d’être mère m’a permis d’ acquérir plus de confiance en moi dans un environnement tech” par Muleine Lim, Staff Engineering Program Manager Criteo
14h40 // Session de Coaching par Aurélie Vincent, coach Embodyagency.com
Un premier atelier de coaching avec Aurélie Vincent vous aidera à comprendre quelles sont vos plus grandes craintes lors d’une prise de parole. Elle vous aidera à définir comment on s’adresse à une audience, qu’elle soit unique ou plurielle, comment crafter un message simple, clair et mémorable. A travers la découverte d’une technique puis de quelques exercices pratiques, vous serez en mesure de mieux appréhender cet exercice.
Ateliers // (Zenroom & Gameroom )
Dans un second temps, nos deux experts en recrutement chez Criteo, Romain Toffolon et Marisa Bryan seront à votre disposition pour vous soutenir, échanger avec vous sur différents sujets.
Si vous avez un projet, si vous vous posez des questions face à une situation particulière pour une recherche d’emploi, une prise de poste… ils ont pour mission de vous écouter et de partager leur expérience, leur expertise, leur savoir-faire, le tout dans un respect total de confidentialité et avec la plus grande bienveillance. Alors, prêt ?
16h15 // Atelier 1, Marisa Bryan, Global talent aquisition director Criteo
16h15 // Reussir son entretien d’embauche par Romain Toffolon, Recruitment team lead Criteo
The new version of Criteo Labs internship program is out ! This year, we will have over 30 internship offers available in our various SRE and Dev teams with technical challenges and cutting edge technologies to work on.
Our previous year’s Campus program ended with about 20 interns being hired and 92% of these interns yielding in permanent job contracts to stay on with Criteo (from our 2015-2016 program).
As one of the largest and dynamic R&D teams in Europe, Criteo Labs is always faced with new and relevant engineering challenges to help steer our rapid growth and exciting projects. As a result, we are always on the lookout for outstanding graduating talents from engineering schools around (Europe, USA and beyond).
An internship at Criteo is a rare opportunity to enter the real world of engineering, work with and learn from some of the best industry talents in the world on speed, performance and complexity scales you may find in very few companies. From BigData (100PB storage in Hadoop) to DevOps (20k servers worldwide), into web service expertise (150 billion calls a day), network infrastructure (2,4 million HTTP requests/sec peaks) and development excellence (300 code reviews a day) you will learn and train on an incredible platform.
Testimonials from some of our previous interns and their take on our 2015-2106 Campus Program
“Thefirst part of my internship was dedicated to research on existing implementation of kNN and to study papers and articles.
As you may know, naive approaches to this problem are simply too time consuming . Once the best algorithm was chosen, I started messing with Hadoop and its MapReduce to make my kNN scale with Criteo’s catalog.
“Learning to deal with the complexity of implementing such algorithm on Hadoop yielded a great internship experience. Everyday life included Java, a big Hadoop cluster, Git, fast deployment, a fantastic team, meetups, developer jokes and free coffee !”RomaricT
“The highlight of my internship was definitely a presentation I did to explain my work in front of hundreds of Criteo engineers, which was a great personal challenge.”Victor L
Exciting career opportunities
“Working at Criteo as an intern was a great experience all around, which is why I came back one month after the end of my internship to work as a full time engineer in the Network team”. Victor L
“My main task as an intern in our R&D was to build a functional distributed kNN over Hadoop (which was successfully achieved) and by the end of my internship there was a brand new employee (once again that’s me !). Now I’m working in the Scalibity team, creating and maintaining many Hadoop jobs in Scala and happy to be there !” Romaric T
“Throughout the duration of the internship, I was the given the time and opportunity to discover many tools I knew nothing about. I was awarded a lot of freedom to design the application on my own, which made my work that much more satisfying”. Victor L
“When I started my internship in March 2016, I thought I would not learn that much about development lifecycle, but I was wrong. I had already gathered some practical knowledge during my year-off, but it was definitely not enough for such a company like Criteo! I spent the first few weeks getting used to the everyday’ s tools, and the internal stuff, and realized the perks of this agile but yet very mature machine.” Alexandre C
“The team I joined was composed of a few experienced developers. They helped me at every level: reviewing my code to tell me how to improve it, what were the compromises and how to balance performance and expressiveness. All along, I could feel this “skill upgrade” happening progressively while just enjoying my coding adventure.” Alexandre C
For Criteo Labs, opening its doors to new graduating talents means that, you do not need to have tons of experience to bring value in a team but just your pure technical curiosity to learn, impact and experience our exciting growth. Please come jumpstart your career with us, work hard and play hard with us, build and enhance your skills to bootstrap your career !
Criteo is organizing the Machine Learning in the Real World workshop. This workshop aims at bringing together people from the industry and from academia to better understand which machine learning algorithms are used in practice and what we can do to improve them.
Who should come?
Anyone who is involved in applying machine learning on real world data is welcome. We especially expect to see:
PhD students, faculty members, and industry research scientists.
Software engineers using machine learning in applications.
Entrepreneurs looking to apply machine learning to solve new challenges.
This event is primarily intended for people with technical fluency in machine learning to help each other advance the state of the art. If you are interested in learning about the basics of the field, there are plenty of other great events in Paris which are probably more appropriate.
To maintain meaningful interactions, we will keep the number of participants to around 60. We will select the applicants based on proficiency in machine learning but we will also ensure a good balance between academia and industry, as well as a diversity of backgrounds and interests.
Here is a sample of people you will get to meet, based on early signups:
– Alexandre d’Aspremont, Professor at ENS Paris
– Alexandre Gramfort, Assistant Professor at Telecom ParisTech
– Francis Bach, Researcher at INRIA
– Vianney Perchet, Professor at ENS Cachan
– Nicolas Usunier, Research Scientist at Facebook AI Research
If you would like to participate, please apply here before November 13th .
Open Discussion Topics
To encourage discussions, multiple rooms will be available, each with a suggested topic. Participants interested in discussing a particular topic can enter the corresponding room and join in on-going discussions. We encourage you to choose which topics you are interested in or to submit your own here (link to application). We will select those with the strongest appeal.
The event will take place in Criteo headquarters, 32 rue Blanche, 75009 Paris, on November 29th with talks from leading researchers in the morning while the afternoon will be dedicated to open discussions with suggested themes to foster interactions between participants. Breakfast and lunch will be provided and there will be a banquet after the workshops at 6PM. Criteo cannot provide lodging or travel to the participants but we are happy to help if you want information about local accommodation.
9:50 – 10:35: Nicolò Cesa-Bianchi, Algorithmic challenges in real-time bidding
10:40 – 11:25: John Shawe-Taylor, Conditional Mean Embeddings for Reinforcement Learning
11:30 – 12:15: Jonas Peters, Connections between causality and machine learning
12:15 – 13:30: Lunch
13:30 – 18:00: Open discussions
18:00 – 19:00: Cocktail
Speaker bio & abstracts
Title : Adaptive label acquisition in non-stationary environments
Collecting and storing labels from different sources is key to train and evaluate supervised learning algorithms. However, labels are often expensive to obtain, thus selecting which items to get labels for is key to optimally use any available labeling budget, both when training and evaluating a model. At the same time, if available labels are not correctly used, incorrect or biased results can be produced.
In this talk I will discuss some of the challenges and potential pitfalls of acquiring and using labels for classification in an evolving environment. I will present a system that store labels, provides a way to select labels to optimize budget while providing accurate and unbiased evaluations of the classification models.
Shie Mannor is a Professor of Electrical Engineering at the Technion.
He earned a PhD in Electrical Engineering from the Technion in 2002. He was then a Fulbright postdoctoral associate with LIDS (MIT) for two years. He was subsequently a Professor at McGill University for six years, where he was an incumbent of a Canada Research Chair in Machine Learning. Since 2009 he has been with the Andrew and Erna Viterbi Department of Electrical Engineering, where he is a full Professor.
Shie has published over 70 journal papers and over 130 conference papers in leading venues and holds 8 patents. His research awards include several best paper awards, the Henri Taub Prize for Academic Excellence, an ERC Starting Grant, an HP Faculty Award and a Horev Fellowship.
Shie’s research interests include machine learning and data sciences, planning and control, analysis and control of large-scale systems, and interconnected systems.
Università degli Studi di Milano
Title:Algorithmic challenges in real-time bidding
Online ads are mostly sold via second-price auctions run on a per-impression basis by ad exchanges. Publishers can boost their revenues by dynamically choosing the reserve price in each auction. In this talk we review the main algorithmic challenges that arise in the context of reserve price optimization. Our focus will be on regret minimization approaches with mathematical guarantees on their performance. We will consider various practical issues including exploiting auction features and coping with strategic bidders.
Nicolò Cesa-Bianchi is professor of Computer Science at the University of Milano. He was President of the Association for Computational Learning and member of the steering committee of the EC-funded Network of Excellence PASCAL2. He held visiting positions with UC Santa Cruz, Graz Technical University, Ecole Normale Superieure, Google, and Microsoft Research. He received a Google Research Award (2010) and a Xerox University Affairs Committee Award (2011-2014). His research interests focus on: theory and applications of machine learning, sequential optimization, and algorithmic game theory. On these topics, he published two monographs: “Prediction, Learning, and Games” (Cambridge University Press, 2006) and “Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems” (NOW Publishers, 2012). He has over 150 indexed publications, including more than 40 papers in international scholarly journals. Overall, his papers have been cited over 10000 times.
University of Copenhagen
Title: Connections between causality and machine learning
Causal knowledge is required in order to predict a system’s response after an intervention. In this talk, we argue that machine learning methods can benefit from causal ideas in problems that go beyond predicting variables in interventional settings. Connections to systematic noise removal, reinforcement learning and domain adaptation exist but are not yet fully understood. We present applications in advertisement and exoplanet search. The talk covers joint work with numerous people, including B. Schoelkopf, D. Janzing, L. Bottou and M. Rojas-Carulla.
Jonas is an associate professor in statistics at the University of Copenhagen, he is a member of the “Junge Akademie”. Previously, Jonas has been leading the causality group at the MPI for Intelligent Systems in Tübingen and was a Marie Curie fellow at the Seminar for Statistics, ETH Zurich. He studied Mathematics in Heidelberg and Cambridge and did his PhD with B. Schölkopf, D. Janzing and P. Bühlmann, his thesis received the ETH medal. He has been working with L. Bottou at Microsoft Research Redmond (WA, USA), M. Wainwright at UC Berkeley (CA, USA) and P. Spirtes at CMU (PA, USA).
University College London
Title: Conditional Mean Embeddings for Reinforcement Learning
Conditional Mean Embeddings (CME) provide a way of learning to estimate expectations under unknown distributions. We consider their application to learning the system dynamics for Markov Decision Processes (MDPs). This results in a model-based approach to their solution that reduces the planning problem to a finite (pseudo-) MDP exactly solvable by dynamic programming. Unfortunately the size of the finite MDP scales badly with the amount of experience. By approximating the loss function of the CME the size of the induced (pseudo-) MDP can be compressed while maintaining performance guarantees. At the same time the CME model can itself be approximated using a fast sparse-greedy kernel regression. The performance of the composite method compares favourably with the state-of-the-art methods both in accuracy and efficiency.
John Shawe-Taylor is a professor at UCL where he directs the Centre for Computational Statistics and Machine Learning and heads the Department of Computer Science. His research has contributed to a number of fields ranging from graph theory through cryptography to statistical learning theory and its applications. However, his main contributions have been in the development of the analysis and subsequent algorithmic definition of principled machine learning algorithms founded in statistical learning theory. He has co-authored two influential text books on kernel methods and support vector machines. He has also been instrumental in coordinating a series of influential European Networks of Excellence culminating in the PASCAL networks.
Today Criteo Lab’s Justin Coffey will be presenting a high-level overview of the history of the Analytics Stack at Criteo. I’ll discuss where we came from and how we came to build what we have today in collaboration with Analysts. While we’ve largely addressed issues with data volume and velocity, I’ll dig into two topics that are much less frequently touched upon and a clear work area for us: keeping up with the variety of data and an ever increasing number of highly active analytic and reporting users.
Criteo Predictive Search Significantly Increases Performance and Eliminates Guesswork for Paid Search Marketers in Google Shopping
Enables Retail Marketers to Effectively Grow Largest Digital Ad Channel
Expands Criteo’s Business and Strengthens its Performance Marketing Platform
NEW YORK – October 25, 2016 – Criteo S.A. (NASDAQ: CRTO), the performance marketing technology company, today announced the launch of Criteo Predictive Search, a groundbreaking product that brings the Company’s proven, performance-based approach to Google Shopping. Criteo Predictive Search, available immediately in the US, offers an automated, end-to-end solution, based on powerful machine-learning, that eliminates guesswork and systematically improves results from Google Shopping using precise, predictive optimization across every aspect of the campaign.
Early adopters of the new solution, who have taken part in beta tests, have seen as much as a 22-49 percent lift in revenue at constant cost. These clients include 30 of the leading U.S. retailers, such as Revolve Clothing, Teleflora and Camping World.
Search is the largest digital channel for marketers, garnering 45 percent of digital ad spend with Product Listing Ads comprising 21 percent1, making the ad tech side of the industry prime for disruption. Marketers are actively seeking more impactful solutions to reach their target consumers. Criteo’s unique approach to search offers the first end-to-end solution on the market for Google Shopping, delivering performance at scale. With increasing competition in the Shopping channel, retailers are losing out on significant opportunities as a result of the extensive time it takes to realize gains from manual, reactive optimization approaches.
“Google Shopping is a huge opportunity for retail marketers, with Shopping quickly becoming the biggest ecommerce performance driver for retailers.” said Jason Lehmbeck, General Manager, Search, Criteo. “Yet, the tools available today are overly complex and time-consuming, and do not sufficiently help marketers connect with consumers who are actively shopping for their products. Our goal with Predictive Search is to eliminate the guesswork of managing Shopping Campaigns while delivering unbeatable performance for retailers.”
“With consumer trends and inventory constantly evolving, our team found that a standard Google Shopping campaign structure was limiting our performance,” said Ben Shum, Search Engine Marketing Manager, REVOLVE Clothing. “With Criteo Predictive Search, we’re able to tap into Criteo’s wealth of knowledge and expertise in product feeds, user data, campaign structure and have since seen our return on ad spend increase by over a third.”
“Criteo’s management of our Google Shopping program has generated incredible results,” said David Gottesman, director of digital marketing, Teleflora. “Criteo Predictive Search has really helped fine-tune our bidding strategy and increase our impression share. Our year-over-year performance has grown by triple digits in just six months.”
A Criteo-sponsored Forrester survey found that as paid search matures and more companies get the basics down, competition rises and marketers look to new search capabilities for differentiation in an increasingly crowded space. These capabilities add to the complexity of executing effective paid search campaigns.
Criteo is committed to contributing to the open source community.
We use ample open-source software internally (Cassandra, Chef, Couchbase, Gerrit, GitLab, Graphite, Hadoop, Kafka, …).
We publish the tools we believe to be of general interest, and contribute bug fixes and improvements to the open-source software we use.
Our projects are split between two GitHub entities:
Being interviewed for a job is always a stressful experience. Trust me, even for a recruiter, it is tough!
It takes courage to voluntarily go out there to be challenged and evaluated. As an R&D Recruiter, I often hear candidates stating : “the process is too long, I am busy”, “I am not good enough to succeed at Criteo’s technical interviews”, “I have been working for 10 years for the same company and I would have to prepare hard, etc…”
Well, well, well… It might not be the right time for you to interview but I can tell you a few things.
Criteo’s R&D department is the right place to be
At Criteo, we are very proud of our R&D and we like to showcase it through our interview process. As an R&D Recruiter, my team’s job is to find you, yes *you*, the person who we *know* will fit perfectly with the R&D teams we work with. We sit among them, we laugh at their jokes (not always to be honest), we share the joys of their successes and make fun of their teams & projects’ names (cf. Bigorneau, Biker…etc).
You will never be alone! Whether you personally applied for the job, or have been referred or contacted by a recruiter, you will be in touch with a recruiter throughout the hiring process. My team aka the recruitment team is tech-friendly! We are so tech-friendly that we often attend coding classes provided by our R&D engineers to satisfy our technical curiosity and understand you better when you are interviewing with us. Hence, don’t be shy while introducing the projects you work on ☺.
Your R&D Recruiter will be your point of contact from the first step to the very last. His/Her objective will be to get to know you better and provide you as much information as possible about who we are : the teams, the projects, the technologies and the different stages of the recruitment process. For instance, following your first interview, the recruiter might send you articles from our R&D blog about how we do our code review, our career tracks and other things we care about.
The recruiter will also send you an interviewing guide related to your interview process in order to help you getting ready and avoid getting stressed for nothing. We want you to succeed and to have fun!
Myth : the recruitment process is time-consuming
The recruitment process usually takes 2 weeks minimum or up to several weeks depending on your availability (and whether you may need a visa to come to interview with us). It can go pretty fast! Just keep in mind that the R&D recruiter is here to ensure it is the right moment for you.
It is important to enable you to think about every piece of information you collected throughout the different conversations in order to make sure everything is crystal clear in your mind. We do not want to rush everything because what’s coming is… one of the best recruitment experience ever!
Our R&D recruiters want you to interview with us to find the perfect match between our projects and you! That’s why the Engineering Hiring Process at Criteo Labs is smooth and transparent. It has been designed in order to assess skills, cultural fit and problem solving.
This is also an opportunity for our engineers to provide more details about what they do at Criteo and give more information about their teams and projects plus answer all questions you may have. Criteo Labs is very heterogeneous in terms of technologies and challenges, it goes from Machine Learning to building a Data Center from scratch, from Data Science to performance and scalability issues, from Windows Chef cooking to Cross-Device. For sure, you will find something you’ll enjoy.
The recruitment process is AWESOME because the interviewers are well-wishing
Once you are done with the Skype interview(s), you will spend one afternoon with us in order to meet the teams!
In addition to challengeing you on your technical skills, the interviewers will be giving you examples of teams and projects you could be working with if you join us! Our interviewers are engineers who had to go through the same experience as you.! They survived, they enjoyed and they liked it so much that now they are on “the other side”, INTERVIEWING.
Keep in mind the interviewers will never expect you to come up on the spot with the right solution to the exercise they will submit to you : they will just try to understand the way you think and challenge you a bit to see if you can participate in a fruitful way to our future projects. It is not hard, it is fun and a great way to learn new stuff. We like to see your collaborative spirit and team-oriented attitude. On a daily basis, you will work with different teams and interact with peers, so you have to be into sharing your thoughts.
And, I have good news! If you fail during the interviews, you are not black-listed!
You can try again later! We are aware that, sometimes, it is just a matter of preparation or timing. Failing is ok. It is the first step to discover your potential and understand the areas you can improve! We have people within our R&D who got hired the second time around and we love them as much as the others.
Plus, a recruitment process is never perfect. We are careful about your feedback and try to improve our processes thanks to you.
Some testimonials from a few of our engineers below highlighting their interview experience before joining Criteo! ; bear in mind everything written is not a Marketing speech, but rather coming from direct sources .
“My interview process at Criteo started with a call from a recruiter that talked to me about the company and the opportunity involved. I was curious, so I spent some time discovering what Criteo does and how it does it: I decided to go on, so I prepared myself in order to pass the technical interviews which were aimed at assessing my capabilities in different domains. I enjoyed it because they were quite challenging, but also rewarding, and it was a good opportunity to sharpen multiple skills and taste some of the day to day problems that they encounter. The final interview was a talk with the managers: it was like a friendly chat, and gave everyone involved a good idea on how it would be to work together. The hiring process at Criteo was overall a good experience: not too stressful, I had the chance of learn plenty of things and meet interesting and friendly people.”
“The good thing about the interview process was that it went at the speed that I needed. It was hard for me to manage to get free time as I was an intern at an another company. The recruiter was understanding and managed to set the meetings at the end of the work day when I could.
I found the Machine Learning interview content interesting, as it started with the basics and went further, according to the candidate knowledge/ level. The candidate can always find a challenging question / problem to solve while treating real Criteo challenges.
The main reason why I joined Criteo was what I felt being around its engineers during the interviews. I thought it would be perfect to evolve and grow up as an engineer among those incredible people : they perfectly knew and owned their job area, they were passionate for their work and they managed to solve and dealt with real life challenges!”
“The recruiting process was clear and transparent to me. I loved in person interviews, since they gave me a much better impression of the company, culture and people who work here and it was a great motivator for me to accept the offer. And as I said before, that was in general very good experience for me.”
Hence, whether you come to interview just to train yourself or because you are looking for a change, I have 3 pieces of advice : ask questions, prepare yourself and relax ☺
Interviewing is not just about you being analyzed, it is also about finding the right opportunity for YOU! Think about the type of company you’d like to work for and make sure you’d be happy with us at Criteo. There is a lot a candidate can learn from a company by the way they run their hiring.
EuroPython is a generalist conference around the Python programming language. Started in Belgium in 2002, it has the particularity of being organized by volunteers and moving its location regularly around Europe. This year, for the 2nd time, it took place in Bilbao, Spain between the 17th and 24th of July.
One unique feature of this conference is that: the talks selection is mainly based on attendees vote and the recordings of the talks are freely available to anyone during the event (via streaming) and after (see here).
From 250 attendees at its beginnings, EuroPython has become the largest Python conference in Europe and the 2nd worldwide (after PyCon) with 1100+ attendees in 2016.
The conference lasts 8 days. It starts by 1 day for workshops, followed by 5 days of talks, trainings, helpdesk and poster sessions held in 7 parallel tracks to finish with a week-end of sprints on open source projects.
If it is true that C#, Java and Scala are the main programming languages used at Criteo, Python succeeded in finding its place in different teams thanks to its flexibility and speed of prototyping:
Daily, tens of software updates are deployed in production on more than 12 000 Windows servers, most of them directly exposed on the web. The deployment of software is a critical task where Python provides us both scalability and the agility to react for new features and issues.
With such a rhythm, we need safeguards to ensure our business still runs smoothly despite inevitable bugs & issues. All our software & hardware are measured constantly. Specific analysis happen after each update, generic ones are done continuously and are summarized by reports, graphs, alerts. Python is present at all the steps of this monitoring chain.
The Criteo Infrastructure is composed of more than 20 000 servers and 3 000 network devices across 15 data centers worldwide. Those numbers continue to grow. To operate the infrastructure at such a scale, dedicated tools have been developed to manage data centers. Python is used there to speed up the development of new features.
Criteo is a heavy user of machine learning algorithms. Deciding which algorithm to use is the result of a lot of research. Thanks to its data science and machine learning libraries as well as its integration with plenty of tools, Python is the language of choice to investigate new solutions.
Why was Criteo present?
As part of the R&D Culture @ Criteo, we consider it important that our engineers can attend big conferences to keep themselves up-to-date on the technologies they use.
Also, we are constantly looking for new talents who’d like to tackle exciting problems at Criteo’s scale and such a conference is the occasion of meeting talented developers.
So, if you’re interested in working with Python (or any of the other languages used at Criteo) in an international, technology-driven company, or just curious, drop us an email at firstname.lastname@example.org or have a look at our current opportunities.
With so many talks to listen to, it could be hard to decide where to start. To help you, each Criteo engineer who attended the conference picked one talk and explained in a few words why they liked it.
Here is a talk I went because the title sounded fun and I didn’t know anything about the matter. It is about the personal experience of Radomir, a Red Hat software engineer interested in robotics in his spare time. After an introduction to the physics of walking robots (number of legs, balance, number of leg joints, placing a foot in a 3 dimensional space) illustrated by hand-made drawings, he presented the various robots he has built. Obviously, most of the programming work was done using Python or microPython on various “embedded” chips (BBC microkit, Arduino, Raspberry Pi…). In the end, it was a good surprise since I got to learn a few things and it makes part of robotics less blurry. (Anael Beutot, Software Engineer, Criteo R&D)
Nowadays, we hear a lot about machine learning and its hype, plus we cross it every day: smartphones, web search results, image recognition, even your car. There is a huge potential in this technology, but ML is a lot of things and so it was a little bit blurry to me. Javier explained it very clearly from the main concepts to the types of ML as well as presenting the different algorithms available but also showing concrete python code examples. (Djothi Carpentier, Software Engineer Criteo R&D)
When we develop a program, we must be sure to meet the set performance targets. In his presentation, Manuel discusses about 2 important aspects: (1) the strategy for optimization (What we do want to do?, at what cost?, do we have enough knowledge of the code, context, environment?, …) and (2) the tools for optimization (speed, memory, resources… From basic ones (time, htop, ntop, …), to memory profiler, line profiler, plop, …). You can use this presentation both for general development (strategy part) and operational dev (tools for optimization) (Gilles Bourguignon)
If you don’t have much time, just take 1 minute to read Erik’s slides, which basically are a pretty bullet-point list of things to not forget when you’re building a service that might become big and heavily used before you’d realize you’re not ready for it. If you have half an hour, listen for his feedback on his experience on building an efficient and continuously used service built on well-known technologies. (Hugues Lerebours)
Python’s double-underscore (“__x__”) methods and attributes go by many names, including “special”, “dunder”, and “magic”. They are used by everyone whether they know it or not. Doing an addition of objects, getting an item from a list and many other actions are disguised dunders. However, they can be magical if you know how to hack them. Anjana showed us in a very fun and pedagogical way how we can use them, modify them and even ease our everyday work, creating a CrazyList doing weird stuff. With these examples, the Python developers have a better understanding of the dunders mechanism and have thus more knowledge to optimize their code. (Remi Guillard)
We all know the term: ‘Technical Debt’. But did you really think about it ? What does it means ? What is the money ? How it evolves ? Is it ok to have it ? Mircea gives us clues about how you
could prevent, accept, reduce or pay a debt, and the money of that debt. Debt can be contracted by ignorance or on purpose as a time trade-off for an imperfect something but now – like a bank loan. It’s not mandatory to pay it (don’t try with your bank!). It’s not good or bad. It all depends on the interests you pay for it and the time amount it takes to reduce it. (Rudy Sicard)
From the point of view of the Python developer, bytecode is a necessary by-product of her/his activity but clearly not something one looks at. In her talk, Anjana showed in a very pedagogical way how you can look at the bytecode generated from your code using the dis module and what you can gain from diving in it. In particular, she used bytecode to explains a counter-intuitive puzzle showing that knowing what happens down there can help a developer understand what happens at a higher-level in his/her code. (Renaud Bauvin)
I love Python for its elegance and plasticity but, sometimes, I wish the interpreter would be a bit faster. The “FAT Python” project, led by Victor Stinner, is just about that. FAT Python introduces a new peephole static optimizer that does its best to optimize the abstract syntactic tree (AST). It is very different from PyPy with its tracing just-in-time compiler. FAT Python is intended to be seamlessly integrated into and improve upon CPython. The main idea behind FAT Python is that, while Python is a very dynamic language, what is not mutated can still be optimized. For each function, FAT Python holds both the original AST and an optimized one. Then, at runtime, it runs the optimized code, if no dependencies have been mutated, or the non-optimized one otherwise. The AST optimizer already implements well-known optimizations like loop unrolling, constant folding, dead code removal, etc. It is written in pure Python which make it easy to enhance and experiment with. I don’t know about you, but I find that absolutely exciting!
We take the opportunity to thanks the organizers for the great job they did in organizing this edition and for the attendees we met at our booth or during informal discussions around pinchos for the insights they brought.
The locations and date of EuroPython 2017 is not yet known but for sure, see you next year!