Criteo’s infrastructure and automation with Ansible

By: CriteoLabs / 07 Mar 2016

If you’re here reading this blogpost you probably have heard of Criteo but we bet you don’t know that we are running a huge infrastructure. But look at the figures! We currently have 15 datacenters across the globe, more than 15k servers, two Hadoop clusters with more than 90 PB of cumulated storage capacity and 40k cores, 1M HTTP req/s, multi-10Gb backbone, etc. This infrastructure is ran by a team of 60+ passionate engineers, from day-to-day operations to design and build the next-generation infrastructures that will handle the load required by the massive growth of Criteo.  We are working in a very fast-paced release cycle and are adding new capabilities weekly and even daily.

This massive infrastructure management is split into two teams, one taking care of the layers 0 to 4 and the other one to layers 5 to 7 ! The first team, called Infrastructure is in charge of delivering housing, power, cooling, and networking to the SRE team which is in charge of the operating systems and software backends management.

Who uses Ansible at Criteo?

Infratools team

Managing infrastructure assets at such a scale and planning for its evolution is not an easy task. At Criteo, dozens of people are dedicated to data center management. They tested numerous solutions, both proprietary and open source. Some key features we still missing : API, patch panels and cabling management, etc. So while  Excel is still a popular choice among data center managers, Criteo chose to design and implement a solution tailored to its needs. And that’s the raison d’être of the Infratools team : write and maintain software that make managing a global infrastructure much easier (and much less error-prone). Our vision is to provide a fine-grained data model of the whole infrastructure through an API, and leverage it in automation tools. The better part: we plan to open source much of our work. So if you have interest in it, stay tuned !

Network team

Managing more than 2000 devices is an everyday challenge, especially when you made the choice not to rely on a single vendor to prevent lock-in: you have to deal with multiple command-line interfaces and APIs, with different (and incompatible) data models. As stated at AnsibleFest, network is still the most immature component in infrastructure as of today.

As such, network automation will be the next big thing in automation. Today, thanks to Ansible and its simple, agentless model, we already manage to generate and deploy configurations for routers, switches, and load balancers.

We use custom modules, roles and templates to do so and generate full configurations (more to come on this in a later blog post). For older devices that don’t support configuration merge, we have developed and open sourced netcompare ( https://github.com/criteo/netcompare). It permits, given two cisco-style configurations, to calculate the commands to type into the equipment in order to converge into the desired state.

At AnsibleFest, we were glad to see that Ansible is stepping into network automation (bringing major vendors onboard, such as Arista, Cisco or Juniper…), releasing ansible-network 2.0 TP1 ( https://www.ansible.com/network-automation). It comes with command, config and template modules so one can have a standard way to communicate with devices, and leverage an independent data model to deliver multi-vendor configuration and orchestration.

We had the chance to talk with Ansible team that confirmed us they are very excited to support “NetOps”. There is still a lot of work needed to have a true vendor-agnostic configuration management system, but we are very happy to see that things are going in the right direction.

What we saw at AnsibleFest

It was the opportunity to learn more about Ansible and meet a community of passionate developers. The conference itself talked obviously about Ansible: new features in version 2.0, future road map, and how to extend with user modules.

As a matter of fact Ansible is not a finality, just a tool among others you use in your technical environment, so other topics related to its integration in order to reach several goals :

– Automate and secure your network infrastructure
– Improve continuous delivery
– Create immutable instances, deploy and auto-scale according to user audience

AnsibleFest was also the best moment to stay in touch with the community. With over 500 people attendance, this event was pretty huge and we saw how easy it is to contribute to the project, by creating modules or publishing roles.

Want more?

By the way, the Network team is hiring!! If you are interested in joining our wonderful team, we have this job waiting for you: Network Engineer R&D

Drop us a line at r&drecruitment@criteo.com if you’re curious about our projects and want to be part of the development team of the future!!

  • CriteoLabs

    Our lovely Community Manager / Event Manager is updating you about what's happening at Criteo Labs.