On-call declaration form for SREs

By: CriteoLabs / 17 Sep 2016

Background

Site reliability engineering (SRE) is a crucial part of Criteo’s organization. It builds and maintain the platform and services powering Criteo’s online advertising business. In order to achieve reliability and availability, SREs are required to provide on-call support on a regular basis. To achieve this, we use Criteo’s VPN to connect remotely through SSH. Some of our services such as the Hadoop clusters use Kerberos as well. Hence, using Linux or Mac OS tends to make the task easier. Yes, this might involve fixing problems at 2AM on a Wednesday, but fortunately this a pretty rare event – SREs strive to build a stable platform!
Of course, SREs get a compensation financial for on-call days and nights, as well as rest days in case they intervene to fix a problem. In order to benefit from this compensation, SREs used to have to fill out an Excel form filled with macros. Unfortunately, this form did not work well on Libre Office or Office online – but most SREs use Linux. Thus, I took the initiative to start a new project from scratch in order to facilitate this process for all SREs.

New on-call declaration form project

The new on-call declaration form would have to adhere to the following requirements:

  • simple design: an Excel sheet with macros and format rules is a complex beast
  • use pre-existing technologies supported by SRE
  • cross-platform: work on Windows as well as on Linux, Mac OS, FreeBSD and others
  • versioned in Git: Excel is XML-based, but hard to version still
  • easy updates: when HR makes a modification, it has to take effect immediately everywhere
  • printer-friendly: WYSIWSG

Hence, I opted for a web-based tool, which uses the following software:

  • HTML and CSS for the page design
  • Javascript with jQuery for user interactions
  • Mesos & Marathon for deployment

The main reason behind these choices is simplicity, which entails maintainability and shareability of the code base and the configuration files.

HTML page design
table-oncall

  • CriteoLabs

    Our lovely Community Manager / Event Manager is updating you about what's happening at Criteo Labs.